Introduction to Motion Control

by Daniel Plemmons and Paul Mandel

For decades, motion controls have held a persistent place in our visions of the future. We’ve watched the super heroes, mad scientists, and space cowboys of popular media control digital experiences with just a wave of their hands. We’ve been captivated by these powerful, natural, and intuitive interactions; imagining what it would be like to have that power at our own fingertips. Tony Stark’s workshop, Star Trek’s ‘Holodeck’, Firefly’s holographic brain scanner, and Minority Report’s pre-crime visioning computers all exude a sense of power and mastery, along with paradoxical senses of simplicity, ease, intuitiveness, and humanity. Simply, these experiences feel — “Arthur C. Clarke” style—magical.

1_1 ariel 1_2
3D Holographic Brain Scanner from the Firefly episode Ariel (left)
Zion docking controls from The Matrix Reloaded (right)

In this recent decade, we’ve seen some staggering advances in consumer technologies bring us closer making these magical experiences a reality, but technology is not the only thing that must advance to bring motion control to the forefront. The design patterns and software paradigms honed over the 30+ years of mouse, keyboard, and gamepad based interaction design are of surprisingly little use to us, but an understanding of their history is critical. We have to understand what made those interactions great, why they were loved (and sometimes hated) and how we can create new, intuitive, interactions for these novel input devices.

Developing “Intuitive” interfaces

Intuitive is a dangerous term. NUI (Natural User Interface) hardware, like the Leap Motion Controller, often bears the promise of “intuitive” interactions; but when pressed, few developers or designers can articulate what it means to be intuitive. Anyone who has used a poorly designed motion control interface can tell you that simply using the motions of the body to control software does not produce an “intuitive” or “natural” experience. It’s better to break the term down and define what it actually means to be intuitive.

When describing an intuitive experience, one often hears that someone can, “just sit down and start using it”. These interfaces are learnable. They have the right instructions, tutorials, visual affordances, and feedback to help determine how to use it. To be intuitive, an interaction must be learnable.

That said, when describing something intuitive, you often hear that an interaction “just makes sense.” Even something sorely un-intuitive, like say, managing the safety systems on a nuclear power plant, is learnable. Someone could even learn to use the teapot on the front of Donald Norman’s famous, “Design of Everyday Things”. The interactions need to be more than that, they need to be understandable.

2 homer-workspace 2_2
Certainly learnable, but ‘intuitive’?

Once someone’s been told, shown, or discovered how something works, they should have an understanding of why it works the way it does. The interface helps build a mental model of how actions fit into the system. It should become easy to explain it to someone else. Understandability is critical to an intuitive experience.

The final thing you often hear of an intuitive design is that it’s “second nature,” or that it’s “mindless.” What we’re really describing here is habit. We form habits over time, learning and understanding how something works, and making it a part of our routine. When we first learn to drive, it takes effort, but it is learnable, and understandable, and eventually becomes habit. For most daily drivers, the act of driving, and the controls of a car, are intuitive; driving becomes almost instinctual.

A final note on intuitive interfaces; what is ‘intuitive’ to one person may be totally foreign to another. A teenager born in the year 2000 probably hasn’t used a Compact Disc, or even heard of a Floppy Disk. These metaphors aren’t going to be intuitive to them, where-as those same metaphors are second nature to an older audience. When designing for an intuitive interaction, make sure you first understand who you’re designing for; otherwise your ability to create an intuitive experience will come down to sheer luck.

So now we have a better understanding of how to make novel, intuitive, natural interfaces. To be intuitive, our interfaces must be:

  1. Learnable
  2. Understandable
  3. Habitual

Beyond intuitiveness, there are a number of factors that will drastically impact your application experience. Things like ergonomics, information design, and reliable gesture detection can make or break it. The rest of this document will give you a high level view of the design challenges, and tools you’ll likely encounter building a Leap Motion Controller application. Further docs will provide in-depth discussions of many of these topics.

Ergonomics, Posture, and Environment

One of the first things you’ll need to ask yourself is what environment is it going to be used in. Presumably your application will be used by people, and all told, we’re pretty limited creatures. We don’t like holding our arms up for extended periods of time; and though we’re quite adept at using our hands, we can’t hold them perfectly still, or make perfect geometric translations. We tend to move and gesture in different ways, sometimes subtly, sometimes obnoxiously. We often trade efficiency for comfort, or vice versa, depending on our situation; and we rarely take the time to actively optimize our physical spaces. In other words, we usually take the path of least resistance.

Depending on the intent of your application you’ll need to take different ergonomic factors into account. If your application is intended for extended use, consider designing all your interactions so someone can perform them with their elbow resting on a table. If your design needs people’s hands and arms up and moving, design with rest periods, and short bursts of interaction. Where will people use your application? On their laptop while sitting in bed, in a chair at their desk, standing up in the kitchen? Each environment comes with its own challenges, limitations, and opportunities.

Consider how much tension your interactions create. It is hard for someone to hold their hand still or make very fine grain motions in the air for an extended period? Wild motions will definitely have a fatiguing effect. Designing for traditional PC and mobile, these are side concerns at best, but with motion control they are critical to the success of your application.

Virtual Space Mapping and Interaction Resolution

There are a wide variety of ways to translate people’s actions over the Leap Motion Controller into useable input. One of the most common is mapping the 3D space directly to 2D or 3D coordinates in a virtual space.

Technical Sidebar

From a technical perspective, these coordinate conversions are relatively trivial. A common pattern is to convert the real world millimeter positions from the Leap Motion Controller into a (0,1) normal space, then convert those normal coordinates into your virtual coordinate space. There are simpler patterns as well, but this one gives a lot of easy hooks to customize the end-user experience.

The following is a pseudocode example:

// Define the Leap Motion interaction space Vector3 leapMinimum = [-40, 30, -40]; // [x, y, z] Vector3 leapMaximum = [40, 80, 40]; //Define the virtual interaction space Vector3 worldMinimum = [-10, -10, -10]; Vector3 worldMaximum = [10, 10, 10]; Vector3 palmPositionReal = myLeapHand.palmPosition; Vector3 palmPositionNormal = [0,0,0]; Vector3 palmPositionWorld = [0,0,0]; //Normalize the leap coordinates for( int i in 0..2 ) { palmPositionNormal[i] = (palmPositionReal[i] - leapMinimum[i]) / (leapMaximum[i] - leapMinimum[i]); } //Convert to world coordinates for( int i in 0..2 ) { palmPositionWorld[i] = worldMinimum[i] + palmPositionNormal[i] * (worldMaximum[i] - worldMinimum[i]); }

Depending on the needs of your application you may want to adjust your mapping in a variety of ways. Mapping larger world spaces to smaller virtual ones will increase accuracy and stability but will require larger motions on the part of someone using the application. Smaller world mappings to larger screen mappings are more sensitive, but people will feel less ‘accurate’ and small jitters of the hand may become more problematic. Generally, you want to make the smallest dimension of your interaction targets at least 1 inch square (or cubed depending on the interaction) in world-space. Within reason, the larger you can make this interaction resolution, the more safe and in-control people will feel with your application.

There’s also the question of interaction height. Depending on your use case, you may want to raise or lower the real space interaction box. Are people using your application with their elbows resting on the table, seated at a desk with their arms raised, standing at an installation, or something completely different?

There are also situations where the same coordinate mapping may not make sense for each hand. For example, to prevent someone using their right to reach far to the left to access the left side of the screen, the virtual mapping of the right hand can be biased to the right, and vice versa for the left hand.

3_leap
Example of biasing real space to screen space mappings per hand

Interaction Design Heuristics

The Leap Motion API comes with a few built-in gestures, which can be used effectively in the right situations. For most sufficiently complex applications you’ll probably need to customize your own set of interactions mixing and matching with some of the built in API gestures, perhaps in ways they weren’t originally intended. In the process of building applications and various UX experiments at Leap Motion, we’ve come up with a useful set of heuristics for critically evaluating our gesture and interaction designs.

Remember that these heuristics exist as lenses through which to critique and examine an interaction, not as hard and fast rules.

  1. Tracking consistency. Some motions and poses are tracked more consistently than others. In addition to watching the motion in the diagnostic visualizer, the Leap Motion API exposes a “tracking confidence” level to show how accurate the tracking thinks it is at that moment. How consistent is tracking for your motions, given many people performing the motions many times in a real-world environment?
  2. Ease of detection. Some motions are very well defined, others are a lot fuzzier. From a production and maintainability perspective, how difficult is it to detect this motion with a high degree of accuracy both in terms of false positive and false negative results?
  3. Occlusion. Given the limitations of the device, does this interaction have a high chance of causing occlusion? For example, an interaction in which someone might reach a hand across the field of view of the device has a high chance of causing the arm or a shirt sleeve to occlude a majority of the hand, or one hand over the other may cause the controller to lose track of the top hand. Version 2 tracking is improving a lot of these issues, but it remains something to consider while building your application.
  4. Ergonomics. Given the limitations of the human body, the intended use case, and environment for the interaction, what are the ergonomic concerns? Can someone perform this interaction while relaxed, or does it create unnecessary tension? Does repeated use become stressful? Does the interaction require an unnatural motion (e.g. trying to poke straight ahead in Z-space while the hands want to move in downward arcs)?
  5. Transitions. Given the complete interaction set in an application, how does this interaction transition to others? Are the transitions clear and easy to learn? How easy are they to detect? What happens in the case of ambiguous transitions? Each transition could easily be re-evaluated with this same set of heuristic lenses.
  6. Feedback. Does this interaction lend itself well to having good feedback from the UI? Feedback can take a lot of forms; visual, audio, or haptic feedback are the most common. A lack of proper UI feedback can sink an otherwise well-designed interaction.

In addition to these heuristics, our designers also make good use of classic tools for critical interaction design analysis. There are more than a few copies of Nielsen’s “10 Usability Heuristics” taped to our office walls. If you’re not familiar with these critical lenses, or the strategy of heuristic analysis, definitely take the time to peruse these links.

Dynamic Feedback and Gestalt Intermediates

When designing for traditional input devices, we’re used to binary states: hovering or not; touching or not; mouseDown, or not. With motion control, the experience is defined less by individual states, and more by transitions between those states. To account for this, designers must reconsider the structure of their visual and auditory feedback. Just as the controls use motion, so must the feedback. At Leap Motion, we’ve found ourselves referring to this as “dynamic feedback”.

As people move their hands in front of the controller, the application should constantly respond to their motions; communicating what the interface ‘cares about’ at any one time. This is in contrast to most traditional desktop and mobile design, where the interface only changes when people directly act upon the application. The nearest design analog on desktop is hover effects on buttons. It may help to think of dynamic feedback as “super hover.”

4_freeform
The menus in Leap Motion’s application Freeform use constant dynamic feedback to aid usability.

In our prototyping and research at Leap Motion, we’ve found the addition of bold, clear dynamic feedback drastically improves people’s experiences. While developing one of the early Leap Motion applications, Freeform, our design team ran through a rigorous process of prototyping and iteration to develop the UI interactions for the application. In doing so, we developed a set of very successful design patterns and resources for the wider development community. You can find a more detailed discussion of our process and dynamic feedback in Freeform’s UI design in our post on the Leap Motion blog.

Another way to think about visual feedback elements is to create gestalt intermediates. The Gestalt laws of grouping tell us of the principle of “common fate.” That is, our brains tend to group elements which move together. In our research, we found that people didn’t tend to group screen elements with their hands unless the movement on the screen mapped exactly to the movement of their body. Since in many cases the control is not quite this direct (tilt hand to scroll, tap finger to select, etc), it is important to provide an element on screen which gives close to 1:1 feedback with the motions of people’s bodies. In interfaces that use real-world physical metaphors for interaction, showing a full representation of the hand in virtual space can also help people map their actions to the application. Then that element can be visually grouped with the element which is actually being affected, they will have a “common fate”. By providing this intermediate step, we can allow people to make the jump from their hand to the intermediate to the controlled element and make the interaction more learnable and understandable.

Interaction Space Separation, States and Transitions

Detection of gestures and analog motions is inherently fuzzy. If your application has to parse between a lot of different possible actions there is a good chance that it’s going to get something wrong. Worse, with too many simultaneous options, someone using your application is liable to get confused, and have a good chance of making a mistake.

We can look to mobile design to help with this. Due to the limitations of the mobile platform, many mobile applications utilize ‘deep’ interfaces rather than ‘broad’ ones. Each screen offers a few options. It means that an option may be a few ‘taps’ deep, but each screen has significantly lower cognitive friction, and a lower chance of someone making the wrong action. Compare this to a broad interface like Adobe’s Photoshop, an expert interface with a high learning curve. The impact on the final experience is to reduce the chance of making a mistaken action at the expense of obfuscating functionality behind multiple actions.

5_screen 5_2 hootsuite
Tweetdeck’s (left) mobile UI is ‘deep’ whereas Hootsuite's (right) desktop UI is ‘broad’. Each interface has it’s pro’s and con’s.

To pack the most functionality into a single application, you can manage and modify its gesture detection and actions based on the current state of the application. Using ‘deep’ interfaces to limit the number of possible transitions at any one time, and applying a critical lens to the transitions possible from each state, an application can significantly reduce the number of interactions that can damage the end user experience, and reduce cognitive friction.

6_genericizedgesturetransisions

The above diagram shows an example analysis of the possible motion or gesture transitions in an application. Given each possible set of transitions, gesture detection code can be modified for the specific cases, better separating gestures from each other and reducing the probability of false negatives as well.

Forgiving Interactions: Visual Interactions, Control Interactions and Undo.

As application designers, we need to take some steps to protect people using our applications, because humans are fallible and software is confusing. This is especially true when people are interacting with the application through a novel interface that isn’t always perfectly accurate. There are a few ways we can mitigate errors and provide recovery options when errors do inevitably happen.

The easiest way to reduce or mitigate user error is to base interactions on recognition rather than recall. That is, have a labeled menu option or button that give people a very good idea of what will happen when it is activated. This has the added benefit of making the interface much more learnable, as people are “relearning” what each interaction does every time.

Though it’s always better to base interactions on recognition rather than recall, it’s sometimes unavoidable. This is one of the reasons command line interfaces are so difficult to learn; it would be prohibitive to have menu options for every terminal command (and even if you see the command name, it’s not always obvious what it does). The problem of recognition vs. recall is also especially difficult in dense gesture spaces where there are many possible interactions.

However, there are a few ways we can mitigate recall-based interactions. To understand how, it’s useful to look at another common recall-only interaction space: the multi-touch trackpad. Every Apple trackpad has a rich gesture space (1, 2, 3, 4 fingers; swipe up, down, left, right, in, out) and absolutely no direct indication of what any of those gestures will actually do. Apple gets around this by only allowing these interactions to change your view: Switching spaces, showing the desktop, showing the launcher, only showing the windows of the current application, etc. There’s no way to close or minimize a window, submit a form, click a button (except through clicking, which is a cultural standard). By forcing recall-based interactions to only change the view, we minimize the chance that an incorrect recall will result in lost work or an unrecoverable error.

The last ditch option is to provide an ‘undo’. When errors happen, people don’t want the results to be permanent. Giving an ‘undo’ option allows for more control over the system and makes people feel safer, encouraging them to explore and learn more of your system. In terms of gestures, we found that there are two ways (in order of preference) to signal ‘undo’: 1) doing the opposite of the gesture and 2) doing the gesture again. Returning to our previous example, you can find both of these interactions in Apple’s trackpads: after swiping up with three fingers to show all windows across all desktops, either swiping up or down with three fingers will return you to the normal desktop view.

Wrapping up

Motion control requires a significantly different headspace for design than many application developers are used to; and like all design it takes some thought and planning which will pay off in the long run. We are here to help and will continue creating docs and examples to aid in your development.