Designing Intuitive Applications
by Daniel Plemmons and Paul Mandel
In Introduction to Motion Control, we discussed media and cultural touchstones that have shaped popular perception of motion control experiences, what it means for an experience to be intuitive, and some heuristics to help us give our designs a critical lens. (If you haven’t read Introduction to Motion Control, it’s a quick read and provides a good sense of the sorts of design challenges you’re likely to encounter building a motion control application. Check it out!)
In this article, we’ll take a deeper dive into each of these three topics and also link out to some good external reading to help you in designing and developing your Leap Motion applications.
The Leap Motion Controller doesn’t exist in a vacuum. The vast array of motion control concepts, devices, and experiences that have been imagined and created offer a rich history to learn from. Before looking at design specifically for the Leap Motion platform, we’ll take a short survey of motion controls used in fiction, gaming, and art. An understanding of what made these designs successful is hugely valuable to your own design process.
In modern pop media, motion and voice controls, large touchscreens, and holographic displays seem to be standard issue for any optimistic depiction of the future. Their (often) blue glow is a signal of powerful, advanced technology. Chris Noessel’s fabulous analysis of motion control design in popular media, which he wrote for Smashing Magazine in 2003, is arguably required reading for any designer looking to build motion control applications.
But why are we so attracted to motion controls? Depictions in popular media often come with a genius hero wielding some manner of great power. Tony “Iron Man” Stark in his workshop full of holograms and motion controllers designs amazing contraptions with a wave of his hand. In Firefly, Simon Tam is able to scan and analyze his sister’s brain in real time, diagnosing her condition in mere moments using motion-controlled medical equipment. These depictions all play into fantasies of power and mastery – aesthetics that many people have come to expect from motion control experiences. It’s not the sole reason, but many consumers want motion control because they believe it will provide this sense of power and mastery.
Despite their popular associations with science fiction, motion controls have actually played a role in the human-computer interaction space for a while. Here, we’ll look at some real-world examples of motion controls using a variety of devices.
Prototypical Examples of Motion Controls in Gaming. Motion controls have popped up in a variety of fields including gaming, art, and most recently virtual reality. Especially in gaming, immersion is often touted as a prime goal of motion-controlled games. Underlying much of this is the reasoning that if a player’s body is more involved in the play experience, and simulating real-world actions, then they will feel more immersed in the experience.
One of the most successful and direct applications of this thinking is the Dance Central series for the Microsoft Xbox 360 using the Kinect 3D motion tracking camera. Dance Central was preceded by a wide variety of rhythm and dance games, including Dance Dance Revolution, Rock Band, and Guitar Hero. The Kinect sensor made it possible for Dance Central to make use of real-time full body tracking.
In Dance Central, an on-screen avatar performs a repetitive set of dance moves, set to popular music. Players mirror the on-screen avatar and are scored on the accuracy of their movements via the data from the Kinect sensor. As players improve their execution, the difficulty of moves increases, along with the accuracy demanded by the game’s scoring system.
Dance Central and its sequels hit upon a confluence of game and technology to create immersive interactions. A broad set of realistic dance actions are reasonably trackable by the Kinect sensor, and the scoring system is opaque enough to render tracking inaccuracy in the Kinect sensor a non-issue. The player interactions so closely mimic real-world actions that they create a greatly enhanced sense of immersion in comparison to previous motion-control dance titles.
In contradiction to the goal of immersion, many successful motion-control titles instead draw attention to the motion controls themselves – focusing on players performing silly or difficult actions to create entertaining experiences. One excellent example is Die Gute Fabrik’s critically acclaimed motion-control game Johann Sebastian Joust.
JS Joust draws from the legacy of folk and field games to create a unique motion-control experience. Each player holds a PlayStation Move controller as an electronic rendition of Johann Sebastian Bach’s Brandenburg Concerto #2 plays with a varying tempo. The goal of the game is to force other players to move their controller quickly. If the accelerometer in the controller detects too high a spike, the player is out. The tempo of the music determines how sensitive the game is to movement.
In this way, JS Joust draws the player’s attention directly to the motion-control experience. The resulting game dynamics include players tiptoeing about each other as if caught in slow motion. Elegant spins, quick feints, gambits, and more than a small amount of shoving are common. The simplicity and physical freedom of the game also afford interesting social behaviors. As a result, the game focuses more on drawing attention to the bodies and movements of the players and the playspace than to immersion in any particular simulation or fantasy.
Each of these games sits on opposite ends of a continuum ranging from immersion to self-awareness. Where an application lies along the continuum is not so much a quality judgement as a lens through which to understand and critically evaluate the work.
Prototypical Examples of Motion Control in Interactive Art. Motion control, particularly in the form of computer vision – such as the works of Golan Levin and Brian Knep – has been at play in the new media arts world for many years. In this section, we will examine two pieces of interactive art in which motion control was used to create a unique physical and multimedia experience, while at the same time lowering the entry barrier for viewers to take part in the performative aspects of the works.
Craig Winslow’s 2013 work Growth is an interactive motion-controlled installation featuring projection mapping. In Growth, viewers interact with an abstract virtual jungle, which forms an environment that can be modified by the viewer’s actions.
According to Winslow:
The most powerful moment for me was seeing a mother and her two boys interact with complete awe. Once they knew they were in control of the experience, they waved their hands, wiggled their fingers – but in a very respectful way. It reminded me of a quote by Robert Irwin I was told near the beginning of the project, which influenced our intent more than I knew: “You can’t plan nature; you court her.”
The use of motion control in Growth allowed viewers to interact in a unique, natural fashion, not otherwise afforded by traditional interfaces:
Embracing the natural way we would expect people to interact with the device, we made slow soothing movements augment lighting, while aggressive swipes brought in black recursive animations.... Leap Motion amplified the story we were trying to tell, as the viewer’s human interaction contributed to impact dynamically on the installation.
While Growth allows for natural interactions from individual viewers, Rafael Lozano-Hemmer’s Frequency and Volume is a prime example of how motion control can be used to draw the public into a piece of performance art. In Frequency and Volume, the viewers’ shadows on the gallery wall generate the work. According to Lozano-Hemmer:
Frequency and Volume enables participants to tune into and listen to different radio frequencies by using their own bodies. A computerized tracking system detects participants’ shadows, which are projected on a wall of the exhibition space. The shadows scan the radio waves with their presence and position, while their size controls the volume of the signal. The piece can tune into any frequency between 150 kHz and 1.5 GHz, including air traffic control, FM, AM, shortwave, cellular, CB, satellite, wireless telecommunication systems and radio navigation.
Given this survey of motion control, and an understanding of the affordances and limitations of the platform, we can begin to see what sort of experiences are a good fit for the Leap Motion Controller. When utilized properly, it can increase immersion; bring attention to people’s bodies and the spaces around them; allow multimedia experiences to respond in natural, understandable ways; make complex and abstract works more discoverable and accessible, and remove fundamental obstacles from interactions, like making one’s hands visible in virtual reality, or maintaining a sterile or clean environment.
No matter how you’re using the controller, make sure that it fits the use case, and works to augment and enhance people’s experiences with your application.
In their book Brave NUI World (an excellent primer on natural user interface design), Daniel Wignor and Dennis Wixon define a natural user interface as one that “makes the user feel like a natural,” rather than one that simply mimics that ‘natural’ way of doing things. Commonly you’ll hear these sorts of interfaces described as “intuitive.” Unfortunately, clarity as to what makes an application intuitive can be hard to find.
In Introduction to Motion Control, we established a definition for intuitive, based on some of the common ways ‘intuitive’ interfaces are described: To be intuitive, an interface must be learnable, understandable, and habitual.
A common mistake in application design is to first design and build an application, and then figure out how to teach people to use it. This pattern often ends with applications that have steep learning curves and don’t lend themselves to quick, “just sit down and start using it” sorts of interactions.
In thinking about designing learnable applications, we can look to the game development world. Every new game has to teach players how it works, how to control it, and how to work with the new challenges the game presents. There are often complex systems, visual vocabularies, and a large number of rules and statistics the game has to communicate – not only effectively, but in a fun and entertaining way. Game tutorials and “first time player experiences” are often designed and iterated on for almost as long as the application itself.
When designing an application, it’s important to apply a critical lens towards the teachability and learnability of the design at each stage of the development process. Ask yourself how someone new to your application is going to learn each feature. Empathize with them. Why did they open your application? What are they trying to accomplish? How can you best help them accomplish this goal from the moment they sit down with your application?
Look for learnability in your user testing. Presenting prototype tutorial resources (even simply on paper), instead of talking a test participant through the use of a feature, can provide great insights into the learnability challenges of a particular feature or your entire application. How to teach an individual feature is outside the scope of this article, since different applications and features have vastly different needs. Future documentation will dig further into creating tutorials and first-use experiences.
If we think of a tutorial as a way to help a user ramp up on an application, we must also concern ourselves with the ultimate height of the ramp. It’s possible, with enough training, to train someone to use just about any system. However, for the system to be perceived as intuitive, we must go a step further and make sure the interface is immediately understandable.
So how do we do this? The first step is to try to understand the mental model people bring to your system. How do they think it works? How do they believe their actions will impact your application? Whenever we use an interface, our brains try to create a mental model of how the system works. Part of this comes from the process of active instruction we mentioned above, but mostly this happens through exploring what effect the inputs have. Though as a designer the model of the system is apparent, this is often not the case to the end user. Part of why we user test is to gain insight into the user’s model so that we can either change the inputs to adjust the user’s model, or adjust the system to match people's expectations.
When thinking of motion control interactions especially, it’s important to help people remember the action the interaction will create. Because it’s often hard to create visual affordances for motion-based gestures, we must rely in part on other methods to give people a greater understanding. For physics-based interactions, the gestures are usually apparent (drag and drop, clicking, etc). For more complex interactions (minimizing or closing an application, changing colors, etc), physical metaphors can’t be used – so you’ll need to come up with some other way of tying them together.
Sometimes, as in the case of color selection, the visual feedback can create a physical metaphor. Imagine showing two cursors in a color space, mapping hue to one hand and saturation and brightness to the other hand. Another way to match them is by matching the magnitude of the movement to the magnitude of the action; make a fist in the middle of the screen and throw it down to close an application.
By making sure the interfaces we create are easily understandable, we need to ensure that the actions that users perform the first time they learn something will actually stick, so they can continue to use it.
Habits are not innate, but formed over extended use. While we often try to describe an application as “intuitive” or “not intuitive,” the truth is that most experiences we describe as intuitive become so over time. Learning to drive is a prime example; another is the “simple” act of walking, or riding a bicycle. Each starts off as relatively difficult, but is nonetheless learnable and follows understandable rules. Those rules are also predictable and consistent (within the bounds of each experience). That predictability – paired with good visual, auditory, and haptic feedback – is what allows these complex actions to become mindless habits.
The human brain is excellent at finding, remembering, and responding to patterns. As we perform a task repeatedly, the predictable patterns involved in that task eventually become habits. Our brains are able to abstract away various low-level mechanics, feedbacks, and actions into basic patterns we learn to recognize. As long as these patterns stay consistent, we’re able to more or less mindlessly perform these habitual tasks, reducing the cognitive load it takes to use your application. Instead of focusing on the interaction, they can focus on the experience they’re having, or the task they’re trying to complete.
It’s important to remember that turning actions into habits is a long-term rather than a short-term goal. Learning and understanding must always come first, as habits are formed over weeks and months of use, not hours or days. The flip side of this is that once users have learned a habit, it’s nearly impossible to break.
One of the main criticisms of Windows 8 is that it forces users to break longstanding habits about using the system. Though to many users clicking ‘Start’ to shutdown a computer seems counter-intuitive, they’ve learned to do it that way over years and years of experience. When Windows 8 removed the ‘Start’ button, they hamstrung everyone who used ‘Start’ to do anything.
To make your applications intuitive, you must leverage a user’s current habits while providing them space and encouragement to learn new ones.
The most difficult part of intuition is that it’s different for different people. We each have our own perspectives, experiences, cultural references, and languages – so that what makes sense and means one thing to one person or group of people rarely means exactly the same to another. Internationalization teams have to deal with this in very obvious ways every day. Not only do these teams have to change the text that appears in applications, but often the images and cultural references have to change too. (For example, only green is associated with money in the United States, and the western economic connotations of red and green are reversed for some people in Eastern cultures.)
Designers have to deal with this in even more subtle ways. As software makers, the mental models for computer functionality that we’re deeply familiar with are as foreign to most people as another language. Even the ways of thinking about an application and the contexts for its use are decidedly different. Without first defining the audience for your application, researching, understanding, empathizing with them, and then testing with them, you stand little chance of building an application that will actually be intuitive to real-world consumers.
The core qualities that define great, useable applications are no different in motion control than any other application platform. That being said, given the challenges in designing for motion control, there are additional critical lenses worth using to evaluate your designs. We covered these briefly in Introduction to Motion Control; and here we’ll dig a bit more and look at how they can be applied. Using each of these heuristics to evaluate your motion and gesture designs can help you make solid design decisions early in your development process and focus your observations during usability testing.
The team at Leap Motion is constantly working to improve the accuracy and consistency of our tracking technology. That being said, there will always be limitations to any sensor technology, and the Leap Motion Controller is no exception. When developing a motion or gesture, take the time to have multiple people perform the action, while you watch the resulting data in the diagnostic visualizer. Take note of inconsistencies between multiple people and multiple attempts at the motion by a single person.
Taking a look at the two motions above, it’s easy to see that the latter seems to produce inconsistent tracking results. The hand is near the edge of the Controller’s field of view, and the side of the hand is facing the cameras, presenting less surface with which to reconstruct the hand.
Spending the time to make sure tracking is consistent for your particular interactions early will save you headaches down the road.
Once you know the motion you’ve created has relatively consistent tracking, you’ll want to have a concept of how easy they are to detect. Are there obvious conditions that define the motion? How well is it separated from other things you might want to detect? Is it obvious when the motion has begun and ended?
On the surface, ease of detection might seem like a primarily technical concern, rather than being within the purview of design. In reality it, like many things, bleeds well into the space of design. For one, the easier the motions you’ve designed are to detect, the less time you or your developer will spend optimizing the detection code, and the more time can be spent improving the overall experience. Easier-to-detect motions will also have lower rates of false positive and negative detections, making the application experience more useable.
Secondly, and more concretely, the sooner you can accurately detect the beginnings of a motion or gesture, the sooner your interface can provide the proper feedback and behaviors. This will lead to an application that feels more responsive, and makes people feel more in control. It also means you can provide more ways for people to adjust for errors and subtly modify their interactions to fit their particular use patterns.
One example of a gesture that has obvious, easy detection is a pinch. Pinch can actually be a surprisingly difficult motion to work with, due to tracking consistency and occlusion issues – but when those issues are accounted for, it’s fairly easy to define the conditions where:
- a pinch is beginning (when the tops of the thumb and index finger are within a certain range of each other)
- when the pinch is at it’s maximum (when the tips of the thumb and index finger are touching), and
- when the pinch has ended (when the tips of the thumb and index finger are no longer within a certain distance of each other).
Compare pinch to trying to detect a “keytap” motion. For one, keytap takes perhaps an eighth of a second to take place, with no hardware to press against. There’s a surprising amount of variation in how multiple people will perform the motion, and the “beginning” and “end” of the motion look a lot like the general noise that comes from (a) people’s hands being unstable and (b) simple small motions. That’s not to say ‘keytap’ is a poor motion to use, but when deciding what motions and interactions to build into your app, and what features to map those actions to, understanding the ease of detection of the motion is critical.
Occlusion from various motions commonly comes in two forms. The first, and most simple, is when something about the motion physically covers the sensor. When a person has to reach across their body and the sensor, their sleeve, arm, or jewelry (say a large watch or a loose bracelet) can prevent the Controller from getting a clear view of their hands – reducing tracking accuracy or preventing it entirely. If these sorts of actions are common to your application, you may consider changing your real-world-to-screen-space mapping, as discussed in Introduction to Motion Control.
The second form of occlusion is more subtle and can be particularly troublesome. The Leap Motion Controller works using a pair of infrared cameras in the module. The Leap Motion software uses powerful computer vision and statistical algorithms to reconstruct a probable model of a hand based on this data. When the Controller can’t visibly see a part of the hand, it makes assumptions based on the data it has available and an understanding of how the human hand works. Often these assumptions prove quite accurate, but there are times where the system cannot reasonably provide highly accurate responses. This means that if your motion or interaction commonly involves occluded parts of the hand, the accuracy of tracking will be significantly reduced.
One hand covering another, movements of the fingers when the hand is upside down, movements of the fingers when the hand is sideways and off to one extreme side of the field of view, some motions when multiple fingers curl or come together – these can all result in this second type of occlusion. This also comes into play when the hand is presented side-on to the device, as a relatively small surface area is visible to the controller. This is also something that our tracking team is working to improve all the time. As our version 2 tracking progresses a lot of these sorts of occlusion issues are being mitigated. For example, using data from the position and location of the arm, our latest updates as of this writing eliminate many of the issues with one hand occluding another. As new updates are released, definitely check back with different gestures and motions to see if old issues are still relevant.
In many cases, this comes down to testing your actions with the diagnostic visualizer in a variety of areas around the detectable field of view, and watching for inaccuracies caused by occlusion. The more that the gestures and motions used in your design can avoid situations that cause significant occlusion over an extended period of time, the more accurate and responsive your application will be.
As society has adopted computers more and more, we’ve come to understand that human bodies aren’t necessarily well-designed to be sitting at desks, typing on keyboards, and using mice for hours every day. Some companies have responded by making input devices which can be used in much more relaxed positions, and there are large research efforts underway to continually improve our posture and working environments.
Since we’re not designing a physical interface, our task as motion-controlled application makers is slightly different. As we are creating affordances and gestures, we have to consider how we’re asking users to move their bodies to perform interactions, and figure out whether those movements have the possibility for causing long-term harm or strain. Furthermore, we also need to see how tiring our interactions are, and if they can be performed from comfortable positions.
The most comfortable position for people to use most applications is with their elbows resting on the table or arms of a chair. From this position, each hand moves in a sphere around their elbow. The wrist provides some radial range, but it’s extremely limited. With the elbow on the table, wrist motion range is also incredibly limited. In particular, we must avoid repetitive wrist motions to avoid RSIs in the carpals (carpal tunnel syndrome). Certain actions (rolling the right hand counterclockwise) are particularly difficult from this position, and may require users to lift their elbow.
When testing your interactions with people, make sure to conduct the test in a realistic environment and pay attention to and ask your testers about ergonomic issues.
When considering transitions between motions, make sure you have a clear concept of what other motions someone is likely to perform using your application at any given moment. Knowing that set space, you’ll be able to better assess if any of the transitions has a high probability of being problematic. There are two primary ways in which a transition can be an issue for your application experience.
Interaction Overlap. The first is a situation in which two possible actions are too similar to each other. This can cause issues, both for people using the application and gesture detection algorithms. Actions that are overly similar are difficult to remember and have a good chance of reducing the learnability of your application. Reserve actions that are similar to each other for situations in which the two actions have highly similar results.
Awkward Switching. The more subtle transition issue is awkward switching. In-air motions are highly subject to interpretation by the person making the gesture. Where a motion or gesture begins can have a lot of influence on how people tend to perform that motion. Without any hardware to provide direct feedback, people may perform actions differently. This can wreak havoc with your motion detection code. Awkward switches can also cause ergonomic issues where people have to move in uncomfortable or overly exaggerated manners. An “initialization” or “resting” pose from which many actions begin can be a good way to reduce the challenge of dealing with awkward switching. Make a point to analyze and test the various interaction and motion transitions in your application. Look for places where people are confused, see where you and your testers are uncomfortable, and be aware of how different transitions impact how people perform particular motions.
When developing a new interaction, consider how you will provide feedback from the application to the person performing the gesture. As we’ve discussed, the lack of hardware-based physical feedback in motion-based interactions leaves all the onus for communicating the state of the application (and the performance of the person using it) completely on the application’s user interface. Consider how your interface will communicate if an action can be done at all, what it will do, and how will someone know what caused a false positive or a false negative detection (so the person using your app can adjust their behavior to avoid it).
At a minimum, the visual feedback for a motion interaction should communicate three things:
- Where am I now in terms of performing this interaction?
- Where do I need to be to complete this interaction?
- How far and in what way do I need to move in physical space to complete or cancel this interaction?
As interaction design moves forward, the lenses through which we view our ideas and concepts will shape the experiences we create. Each new task will present its own challenges and require a combination of tried-and-true design patterns and new, never-before-seen solutions. There’s no silver bullet for creating great, intuitive applications – no single design that will solve all our problems – but being attentive to the various challenges that we face can help us tackle them head-on.