Camera Images

The Leap Motion controller uses infrared stereo cameras as tracking sensors. You can access the images from these cameras using the Controller.images() or Frame.images() functions. These functions provide an ImageList object, containing the Image objects. Controller.images() provides the most recent set of images. Frame.images() provides the set of images analysed to create that frame and can be slightly older than the images returned by the Controller directly.

An image from one of the cameras. A grid highlighting the significant, complex distortion is superimposed on the image.

The images can be used for:

  • Head-mounted display video pass-through
  • Augmented reality
  • Computer vision

The Image API provides a buffer containing the sensor brightness values and a buffer containing the camera calibration map, which can be used to correct lens distortion and other optical imperfections in the image data.

Image API Basics

Get ImageList objects from either Controller.images() or Frame.images(). The Controller.images() function gives you the most recent images. Frame.images() gives you the images associated with that frame. Since processing the frame takes a bit of time, the images from the frame will be at least one camera frame behind the images obtained from the controller. (In a future version, the data frame rate may be decoupled from the camera frame rate, so the difference could be larger.) Images from the controller have the smallest latency, but won’t match up as well to the tracking data of the current frame. When using Controller.images(), you can implement the onImages() callback in a Listener object. Your Listener.onImages() callback is invoked by the Controller as soon as a new set of images is ready.

Image data is provided as an array of pixel values. The format of this data is reported by the Image.format() value. Currently, one format is in use. This “INFRARED” format uses one byte per pixel, defining the brightness measured for that sensor location. You can display infrared-format data as a greyscale image. Future Leap Motion hardware may provide sensor image data in a different format.

Image Distortion

When a ray of light enters one of the Leap Motion cameras, the lens bends the light ray so that it hits the sensor, which records it as a greyscale brightness value at a specific pixel location. Of course, no lens is perfect, so a ray of light does not land on the sensor in the optically perfect spot. The calibration map provides data to correct this imperfection, allowing you to calculate the true angle of the original ray of light. You can use the corrected angle to generate a distortion-free image, and, using the angles from both images in the stereo pair, you can triangulate the 3D location of a feature identified in both images. Note that the calibration map corrects lens distortion; it does not correct perspective distortion.

For image correction, the distortion data can be fed to a shader program that can efficiently interpolate the correction applied to rays of light. For getting the true angle for a small set of points, you can use the Image.warp() function (but this is not efficient enough to transform a full bitmap at a high frame rate).

The distortion data is based on the angle of view of the Leap Motion cameras. The image class provides functions, Image.rayScaleX() and Image.rayScaleY() that are proportional to view angles large enough to ensure that distortion map covers the entire view, about 150 degrees for the current Leap Motion peripheral. A 150 degree angle of view means that a light ray passing through the lens has a maximum slope of 4/1.

A view angle of 150 degrees corresponds to a slope of ±4 (the tangent of 75 degrees is approximately 4)

The image above shows a reconstruction of the distortion-corrected image data. The brightness value of each pixel in the image originated from a ray of light entering the camera from a specific direction. The image is reconstructed by calculating the horizontal and vertical slopes represented by each pixel and finding the true brightness value from the image data using the calibration map. The red portions of the image represent areas within the rendering for which no brightness value is available (the actual field of view is less than 150 degrees).

Image Orientation

The top of the image is always toward the negative direction of the z-axis of the Leap Motion coordinate system. By default, the Leap Motion software automatically adjusts the coordinate system so that hands enter from the positive direction of the z-axis. (Users can disable auto-orientation using the Leap Motion control panel.) Before hands are inserted into the field of view, it isn’t possible to know which way the images are oriented, since the user can typically place or mount the device in either physical orientation (i.e. with the green LED on the long side of the device facing one way or the other). If the user places the device in the opposite way than you expect, the images will be upside down until they put their hands into view (or turn the device itself around).

Get the Raw Images

Before you can get image data, you must set the POLICY_IMAGES flag using the Controller.setPolicy() function. For privacy reasons, each user must also enable the feature in the Leap Motion control panel for any application to get the raw camera images.


To get the image data, use either the Controller.images() or the Frame.images() function. Since the Leap Motion peripheral has two cameras, these functions return an ImageList object that contains two images (this could change in the future if multiple Leap Motion devices can be active at the same time). The image at index 0 is the left camera; the image at index 1 is the right camera. Note that the left-right orientation of the peripheral can be detected automatically based on the direction from which the user inserts his or her hand into the field of view. Detection is enabled by the auto-orientation setting in the Leap Motion control panel.

Once you have an Image object, you can get the 8-bit brightness values from the data() buffer. The length of this buffer is Image.width() times Image.height() times Image.bytesPerPixel(). The width and height of the image changes with the current operating mode of the controller, which can change from frame to frame. Note that in “robust mode,” the images are half as tall.

The following example gets the image list from a frame and copies the brightness values from the data() buffer to a bitmap (a Cinder Surface object, in this case). The example uses the Cinder OpenGL API for drawing:

//Uses Cinder OpenGL wrapper
Frame frame = controller.frame();

ImageList images = frame.images();
for(int i = 0; i < 2; i++){
    Image image = images[i];

    const unsigned char* image_buffer =;

    //Draw the raw image data as a greyscale bitmap
    Surface surface(image.width(), image.height(), image.width() * 4, SurfaceChannelOrder::RGBA);
    int cursor = 0;
    Surface::Iter iter = surface.getIter();
    while( iter.line() ) {
        while( iter.pixel() ) {
            iter.r() = image_buffer[cursor];
            iter.g() = iter.b() = iter.r();
            iter.a() = 255;

Get the Calibration Map

The calibration map can be used to correct image distortion due to lens curvature and other imperfections. The map is a 64x64 grid of points. Each point consists of two 32-bit values, so the buffer size is 128 times 64 times 4. You can get the calibration map buffer using the Image.distortion() function.

Each point in the buffer indicates where to find the corrected brightness value for the corresponding pixel in the raw image. Valid coordinates are normalized in the range [0..1]. Individual elements of the calibration map can have a value in the range [-0.6..2.3], but coordinates below zero or above 1 are invalid. Discard values outside the range [0..1] when using the calibration data.

To convert to pixel coordinates multiply by the width or height of the image. For pixels that lie in between the calibration grid points, you can interpolate between the nearest grid points. The camera lenses have a very large angle of view (roughly 150 degrees) and have a large amount of distortion. Because of this, not every point in the calibration grid maps to a valid pixel. The following rendering shows the lens correction data as color values. The left image shows the x values; the right side shows the y values.

The red values indicate map values that fall outside the image.

The size of the calibration map is subject to change in the future, so the Image class provides the grid dimensions with the distortionWidth() (actually twice the width to account for two values per grid point) and distortionHeight() functions. The length of the buffer containing the calibration data is distortionWidth() times distortionHeight() times 4 bytes.

The following example illustrates how to get the calibration data:

const float* distortion_buffer = image.distortion();
for (int d = 0; d < image.distortionWidth() * image.distortionHeight(); d += 2) {
    float dX = distortion_buffer[d];
    float dY = distortion_buffer[d + 1];
    if(!((dX < 0) || (dX > 1)) && !((dY < 0) || (dY > 1))) {
        //Use valid calibration data

Image Ray Correction

You can correct the raw image distortion in two ways:

The warp() and rectify() functions are the simpler method, but processing each pixel individually on the CPU is relatively slow. Use these functions if you are only correcting a few points, you don’t need to process data in real time, or when you cannot use GPU shaders. The distortion() buffer is designed to be used with a GPU shader program and can correct the entire raw image while maintaining a good application frame rate.

Correction using Image.warp()

Image.warp() takes a ray direction and returns the pixel coordinates into the raw image data that specify the brightness value recorded for that ray direction.

The following example uses the warp() function to rasterize the distortion-corrected image.

To create the image, the example creates a target bitmap and then, for each pixel, computes the direction of the ray of light that would strike the pixel through an ideal lens. The warp() function is then used to look up the correct brightness value.

//Draw the undistorted image using the warp() function
Vec2f origin = Vec2f(0,0);
int targetWidth = 400;
int targetHeight = 400;
Surface targetImage(targetWidth,targetHeight,targetWidth*4,SurfaceChannelOrder::RGBA);

//Iterate over target image pixels, converting xy to ray slope
unsigned char brightness[4] = {0,0,0,255}; //An array to hold the rgba color components
Surface::Iter tI = targetImage.getIter();
while( tI.line() ) {
    while( tI.pixel() ) {

        //Normalize from pixel xy to range [0..1]
        Vector input = Vector((float)tI.x()/targetWidth, (float)tI.y()/targetHeight, 0);

        //Convert from normalized [0..1] to slope [-4..4]
        input.x = (input.x - image.rayOffsetX()) / image.rayScaleX();
        input.y = (input.y - image.rayOffsetY()) / image.rayScaleY();

        Vector pixel = image.warp(input);

        if(pixel.x >= 0 && pixel.x < image.width() && pixel.y >= 0 && pixel.y < image.height()) {
            int data_index = floor(pixel.y) * image.width() + floor(pixel.x); //xy to buffer index
            brightness[0] =[data_index]; //Look up brightness value
            brightness[2] = brightness[1] = brightness[0]; //Greyscale
        } else {
            brightness[0] = 255; //Display invalid pixels as red
            brightness[2] = brightness[1] = 0;
        tI.r() = brightness[0];
        tI.g() = brightness[1];
        tI.b() = brightness[2];
        tI.a() = brightness[3];
gl::Texture targetTexture(targetImage, gl::Texture::Format());
gl::draw(targetTexture, Rectf(origin, origin + Vec2f(targetWidth, targetHeight)));

The example uses Cinder and draws each pixel in the image. This is not a particularly efficient way to draw the undistorted image, however.

Correction using Shaders

A more efficient way to correct the entire image is to use a GPU shader program. Pass the image data to a fragment shader as a normal texture and the distortion data as encoded textures. You can then texture a quad by decoding the distortion data and using that to look up the correct brightness value in the image texture.

The following example creates an RGBA texture with each 8-bit color component set to the raw brightness value:

const unsigned char* image_buffer =;
Surface surface(image.width(), image.height(), image.width() * 4, SurfaceChannelOrder::RGBA);
int cursor = 0;
Surface::Iter iter = surface.getIter();
while( iter.line() ) {
    while( iter.pixel() ) {
        iter.r() = image_buffer[cursor];
        iter.g() = iter.b() = iter.r();
        iter.a() = 255;

gl::Texture texture(surface, gl::Texture::Format());

The distortion map is encoded into a RGBA32F texture, which stores a 32-bit floating point number for each color channel. Only the red and green channels are used. The red component holds the x-lookup values, while the green component holds the y-lookup values. If available, you can also use a 2-channel floating point texture format, such as RG32F, to save texture memory.

const float* distortion_buffer = image.distortion();

//Encode the distorion/calibration map into a texture, r for X, g for Y
Surface32f distortion(image.distortionWidth()/2, image.distortionHeight(), false);
cursor = 0;
Surface32f::Iter dI = distortion.getIter();
while( dI.line() ) {
    while( dI.pixel() ) {
        dI.r() = distortion_buffer[cursor];
        dI.g() = distortion_buffer[cursor + 1];
        dI.b() =  dI.a() = 1.0;
        cursor += 2;

gl::Texture::Format textureFormat;

gl::Texture distortionTexture(distortion, textureFormat);

The following simple shaders illustrate how to use these textures. The vertex program is a simple pass-through. The only requirement is that it pass the varying texture coordinate for the target texture to the fragment shader. The fragment program uses the texture coordinate and looks up the index values in the distortion texture map. It then uses the index values to lookup the final color for the target texture in the rawData texture.

#define STRINGIFY(s) #s

  varying vec2 distortionLookup;

  void main() {

      gl_FrontColor = gl_Color;
      distortionLookup = vec2(gl_MultiTexCoord0);
      gl_Position = ftransform();

static const string GLSL_FRAG_IMGPROC = STRINGIFY(
    uniform sampler2D   rawData;
    uniform sampler2D   distortion;

    varying vec2 distortionLookup;

    void main(void) {

        vec4 indexIntoRawData = texture2D(distortion, distortionLookup);

        if(indexIntoRawData.r > 0.0 && indexIntoRawData.r < 1.0
            && indexIntoRawData.g > 0.0 && indexIntoRawData.g < 1.0)
            gl_FragColor = texture2D(rawData, indexIntoRawData.rg);
        } else {
            gl_FragColor = vec4(1.0, 0, 0, 1.0);

Finally, the application uses this shader with the textures to draw a textured rectangle displaying the corrected image.

int targetWidth = 400;
int targetHeight = 400;
Surface targetImage(targetWidth, targetHeight, targetWidth*4, SurfaceChannelOrder::RGBA);

gl::Texture targetTexture(targetImage, gl::Texture::Format());
warpShader.uniform("rawData", 3);
warpShader.uniform("distortion", 1);

gl::drawSolidRect( Rectf(origin, origin + Vec2f(targetWidth, targetHeight)) );


Encoding Distortion Data in an 32-bit ARGB Texture

If a 32-bit-per-component texture format is not available on your target platform, you can use a separate texture for the x and y lookup values and encode the floating point values into multiple 8-bit color components. You then have to decode the values before using them to look up the raw brightness values.

A common method for encoding floating point data in a texture is to decompose the input value into four lower-precision values and then restore them in the shader. For example, you can encode a floating point number into a Color object that has four 8-bit components as follows:

Color encodeFloatRGBA(float input)
    input = (input + 0.6)/2.3; //scale the input value to the range [0..1]
    float r = input;
    float g = input * 255;
    float b = input * 255 * 255;
    float a = input * 255 * 255 * 255;

    r = r - (float)Math.floor(r);
    g = g - (float)Math.floor(g);
    b = b - (float)Math.floor(b);
    a = a - (float)Math.floor(a);

    return Color(r, g, b, a);

To recompose the value in the fragment shader, you look up the value in the texture and perform the reciprocal operation. To avoid losing too much precision, encode the x and y distortion values in separate textures. Once the distortion indices are sampled from the textures and decoded, you can look up the correct brightness value from the camera image texture.

uniform sampler2D texture;
uniform sampler2D vDistortion;
uniform sampler2D hDistortion;

varying vec2 distortionLookup;
varying vec4 vertColor;
varying vec4 vertTexCoord;

const vec4 decoderCoefficients = vec4(1.0, 1.0/255.0, 1.0/(255.0*255.0), 1.0/(255.0*255.0*255.0));

void main() {
  vec4 vEncoded = texture2D(vDistortion,;
  vec4 hEncoded = texture2D(hDistortion,;
  float vIndex = dot(vEncoded, decoderCoefficients) * 2.3 - 0.6;
  float hIndex = dot(hEncoded, decoderCoefficients) * 2.3 - 0.6;

  if(vIndex >= 0.0 && vIndex <= 1.0
        && hIndex >= 0.0 && hIndex <= 1.0)
      gl_FragColor = texture2D(texture, vec2(hIndex, vIndex)) * vertColor;
  } else {
      gl_FragColor = vec4(1.0, 0, 0, 1.0); //show invalid pixels as red

Correction using bilinear interpolation

In situations where shaders are not feasible you may be able to correct image distortion faster using well-optimized bilinear interpolation than when using the warp() function. (As with any such optimization, you should verify your results with performance testing.)

Recall that the distortion map contains a 64x64 element grid. Imagine these grid elements evenly spread out over your target image (with element [0, 0] in the lower-lefthand corner and [64,64] in the upper-right). Each element contains a horizontal coordinate and a vertical coordinate identifying where in the sensor image data to find the recorded brightness for that pixel in the target image. To find the brightness values for pixels in between the distortion grid elements, you have to interpolate between the four nearest grid points.

The base algorithm for finding the distortion-corrected brightness for a given pixel in the target image is:

  1. Find the four points in the calibration grid surrounding the target pixel.
  2. Calculate the interpolation weights based on the distance of the target to each surrounding grid point.
  3. Lookup the horizontal and vertical values at each of the four grid elements.
  4. Bilinearly interpolate the horizontal value using the distance-based weighting factors.
  5. Repeat this interpolation for the vertical value.
  6. Reject any points where either the horizontal or vertical value is outside of the range [0..1]. There is no recorded data for such points.
  7. Denormalize the values so that they represent pixel coordinates into the raw sensor data.
  8. Look up the sensor value at the computed pixel coordinates.
  9. Set this brightness value at the original coordinates in the target image.

The following example fills a two-dimensional array with the distortion corrected image:

float destinationWidth = 320;
float destinationHeight = 120;
unsigned char destination[(int)destinationWidth][(int)destinationHeight];

//define needed variables outside the inner loop
float calibrationX, calibrationY;
float weightX, weightY;
float dX, dX1, dX2, dX3, dX4;
float dY, dY1, dY2, dY3, dY4;
int x1, x2, y1, y2;
int denormalizedX, denormalizedY;
int i, j;

const unsigned char* raw =;
const float* distortion_buffer = image.distortion();

//Local variables for values needed in loop
const int distortionWidth = image.distortionWidth();
const int width = image.width();
const int height = image.height();

for (i = 0; i < destinationWidth; i++) {
    for (j = 0; j < destinationHeight; j++) {
        //Calculate the position in the calibration map (still with a fractional part)
        calibrationX = 63 * i/destinationWidth;
        calibrationY = 62 * (1 - j/destinationHeight); // The y origin is at the bottom
        //Save the fractional part to use as the weight for interpolation
        weightX = calibrationX - truncf(calibrationX);
        weightY = calibrationY - truncf(calibrationY);

        //Get the x,y coordinates of the closest calibration map points to the target pixel
        x1 = calibrationX; //Note truncation to int
        y1 = calibrationY;
        x2 = x1 + 1;
        y2 = y1 + 1;

        //Look up the x and y values for the 4 calibration map points around the target
        dX1 = distortion_buffer[x1 * 2 + y1 * distortionWidth];
        dX2 = distortion_buffer[x2 * 2 + y1 * distortionWidth];
        dX3 = distortion_buffer[x1 * 2 + y2 * distortionWidth];
        dX4 = distortion_buffer[x2 * 2 + y2 * distortionWidth];
        dY1 = distortion_buffer[x1 * 2 + y1 * distortionWidth + 1];
        dY2 = distortion_buffer[x2 * 2 + y1 * distortionWidth + 1];
        dY3 = distortion_buffer[x1 * 2 + y2 * distortionWidth + 1];
        dY4 = distortion_buffer[x2 * 2 + y2 * distortionWidth + 1];

        //Bilinear interpolation of the looked-up values:
        // X value
        dX = dX1 * (1 - weightX) * (1 - weightY) +
             dX2 * weightX * (1 - weightY) +
             dX3 * (1 - weightX) * weightY +
             dX4 * weightX * weightY;

        // Y value
        dY = dY1 * (1 - weightX) * (1 - weightY) +
             dY2 * weightX * (1 - weightY) +
             dY3 * (1 - weightX) * weightY +
             dY4 * weightX * weightY;

        // Reject points outside the range [0..1]
        if((dX >= 0) && (dX <= 1) && (dY >= 0) && (dY <= 1)) {
            //Denormalize from [0..1] to [0..width] or [0..height]
            denormalizedX = dX * width;
            denormalizedY = dY * height;

            //look up the brightness value for the target pixel
            destination[i][j] = raw[denormalizedX + denormalizedY * width];
        } else {
            destination[i][j] = -1;

Draw Tracking Data over Image

It is reasonably straightforward to draw representations of the Leap Motion tracking data over the camera image. If you have drawn the raw image data to a bitmap, you can find the pixel corresponding to a Leap Motion position using the warp() function.

Converting a position in Leap Motion coordinates to horizontal and vertical slopes (from the camera perspective) requires knowing how far the cameras are from the origin of the Leap Motion coordinate system. For the current peripheral version, the offset on the x axis is 20mm to either side. The cameras are on the x-axis, so there is no z offset. The slope is simply the distance from the camera in the image plane – the x-coordinate for the horizontal slope; the z-coordinate for the vertical slope – divided by the distance to the image plane, the z-coordinate. The following diagram illustrates the geometry for the horizontal slope:

The calculation is shown for the left camera; add the offset distance instead of subtracting for the right camera.

Once you know the ray slope values, you can get the pixel coordinates using warp().

Note: The offset can be different for different form factors of the Leap Motion controller, but there is currently no way to get this value from the API.

The following example draws circles over the finger tips in the image.

Vec2f origin = Vec2f(0,0);
const float camera_offset = 20; //x-axis offset of cameras in millimeters
FingerList allTheFingers = frame.fingers();
for(FingerList::const_iterator fl = allTheFingers.begin(); fl != allTheFingers.end(); fl++)
    Vector tip = (*fl).tipPosition();
    float h_slope = -(tip.x + camera_offset * (2 * i - 1))/tip.y;
    float v_slope = tip.z/tip.y;

    Vector pixel = image.warp(Vector(h_slope, v_slope, 0));
    gl::color(.5, 0, 1, .5);
    gl::drawSolidCircle(Vec2f(pixel.x + origin.x, pixel.y + origin.y), 10);

If you have rendered the corrected image data, then correlating tracking data to the image data depends on how you rendered the image. For 3D scenes, this is a matter of using a consistent scale and correct placement of the textured quad showing the camera image. For other types of rendering, you must convert the ray slopes representing a Leap Motion position to a target image pixel according to the way that you corrected the image data.

The following example displays finger tip positions on the distortion corrected image rendered in the example above.

FingerList frameFingers = frame.fingers();
for(FingerList::const_iterator fl = frameFingers.begin(); fl != frameFingers.end(); fl++)
    //Convert finger tip position to a ray from the camera POV
    Vector tip = (*fl).tipPosition();
    float horizontal_slope = -(tip.x + 20 * (2 * i - 1))/tip.y;
    float vertical_slope = tip.z/tip.y;

    //Normalize ray from [-4..4] to [0..1] (the inverse of how the undistorted image was drawn earlier)
    Vector ray = Vector(horizontal_slope * image.rayScaleX() + image.rayOffsetX(),
                        vertical_slope   * image.rayScaleY() + image.rayOffsetY(), 0);

    //Pixel coordinates from [0..1] to [0..width/height]
    Vector pixel = Vector(ray.x * targetWidth, ray.y * targetHeight, 0);
    gl::color(.5, 0, 1, .5);
    gl::drawSolidCircle(Vec2f(pixel.x + origin.x, pixel.y + origin.y), 5);

Calculate the Direction to an Image Feature

Get the direction to an image feature with the Image.rectify() function. Image.rectify() returns a vector containing the horizontal and vertical slopes (as defined from the camera point of view) given the pixel coordinates in the raw image data.

The following example illustrates how to get the use the slope data for a pair of pixels that represent the same point on an image feature. If you can match two pixels in the stereo image pair with sufficient accuracy, you can triangulate the 3D position using the set of slope values from the two camera images.

Leap::Image image_left = frame.images()[0];
Leap::Image image_right = frame.images()[1];

Leap::Vector slopes_left = image_left.rectify(left_camera_pixel);
Leap::Vector slopes_right = image_right.rectify(right_camera_pixel);

//Do the triangulation from the rectify() slopes
float cameraZ = 40/(slopes_right.x - slopes_left.x);
float cameraY = cameraZ * slopes_right.y;
float cameraX = cameraZ * slopes_right.x - 20;
Leap::Vector position = Leap::Vector(cameraX, -cameraZ, cameraY);

Head-Mounted Display Mode

The Leap Motion service/daemon software provides a mode that optimizes tracking when the Leap Motion hardware is attached to a head-mounted display. In this mode, the Leap Motion software expects to view hands from the top rather than the bottom. When ambiguity exists whether the palm of a hand is facing toward or away from the Leap Motion sensors, setting this mode makes it more likely that the software will initialize the hand model so that it is facing away from the sensors. Thus this mode is good for mounting the Leap Motion device on the face of a head-mounted display rig.

To turn on the mode in your application, enable the optimize HMD policy:


The policy is always denied for hardware that cannot be mounted on an HMD, such as those embedded in laptops or keyboards.