The Leap Motion controller uses infrared stereo cameras as tracking sensors. You can access the images from these cameras using the |Controller.images|_ or |Frame.images|_ functions. These functions provide an |ImageList|_ object, containing the Image objects. |Controller.images| provides the most recent set of images. |Frame.images| provides the set of images analysed to create that frame and can be slightly older than the images returned by the Controller directly.
An image from one of the cameras. A grid highlighting the significant, complex distortion is superimposed on the image.
The images can be used for:
The Image API provides a buffer containing the sensor brightness values and a buffer containing the camera calibration map, which can be used to correct lens distortion and other optical imperfections in the image data.
Get |ImageList|_ objects from either |Controller.images|_ or |Frame.images|_. The |Controller.images| function gives you the most recent images. |Frame.images| gives you the images associated with that frame. Since processing the frame takes a bit of time, the images from the frame will be at least one camera frame behind the images obtained from the controller. (In a future version, the data frame rate may be decoupled from the camera frame rate, so the difference could be larger.) Images from the controller have the smallest latency, but won’t match up as well to the tracking data of the current frame. When using |Controller.images|, you can implement the |Listener_onImages|_ callback in a |Listener|_ object. Your |Listener.onImages|_ callback is invoked by the Controller as soon as a new set of images is ready.
Image data is provided as an array of pixel values. The format of this data is reported by the Image.Format value. Currently, one format is in use. This “INFRARED” format uses one byte per pixel, defining the brightness measured for that sensor location. You can display infrared-format data as a greyscale image. Future Leap Motion hardware may provide sensor image data in a different format.
When a ray of light enters one of the Leap Motion cameras, the lens bends the light ray so that it hits the sensor, which records it as a greyscale brightness value at a specific pixel location. Of course, no lens is perfect, so a ray of light does not land on the sensor in the optically perfect spot. The calibration map provides data to correct this imperfection, allowing you to calculate the true angle of the original ray of light. You can use the corrected angle to generate a distortion-free image, and, using the angles from both images in the stereo pair, you can triangulate the 3D location of a feature identified in both images. Note that the calibration map corrects lens distortion; it does not correct perspective distortion.
For image correction, the distortion data can be fed to a shader program that can efficiently interpolate the correction applied to rays of light. For getting the true angle for a small set of points, you can use the |Image.warp|_ function (but this is not efficient enough to transform a full bitmap at a high frame rate).
The distortion data is based on the angle of view of the Leap Motion cameras. The image class provides functions, Image.RayScaleX and Image.RayScaleY that are proportional to view angles large enough to ensure that distortion map covers the entire view, about 150 degrees for the current Leap Motion peripheral. A 150 degree angle of view means that a light ray passing through the lens has a maximum slope of 4/1.
A view angle of 150 degrees corresponds to a slope of ±4 (the tangent of 75 degrees is approximately 4)
The image above shows a reconstruction of the distortion-corrected image data. The brightness value of each pixel in the image originated from a ray of light entering the camera from a specific direction. The image is reconstructed by calculating the horizontal and vertical slopes represented by each pixel and finding the true brightness value from the image data using the calibration map. The red portions of the image represent areas within the rendering for which no brightness value is available (the actual field of view is less than 150 degrees).
The top of the image is always toward the negative direction of the z-axis of the Leap Motion coordinate system. By default, the Leap Motion software automatically adjusts the coordinate system so that hands enter from the positive direction of the z-axis. (Users can disable auto-orientation using the Leap Motion control panel.) Before hands are inserted into the field of view, it isn’t possible to know which way the images are oriented, since the user can typically place or mount the device in either physical orientation (i.e. with the green LED on the long side of the device facing one way or the other). If the user places the device in the opposite way than you expect, the images will be upside down until they put their hands into view (or turn the device itself around).
Before you can get image data, you must set the POLICY_IMAGES flag using the Controller.SetPolicy() function. For privacy reasons, each user must also enable the feature in the Leap Motion control panel for any application to get the raw camera images.
this.controller.SetPolicy(Controller.PolicyFlag.POLICY_IMAGES);
To get the image data, use either the |Controller.images|_ or the |Frame.images|_ function. Since the Leap Motion peripheral has two cameras, these functions return an |ImageList| object that contains two images (this could change in the future if multiple Leap Motion devices can be active at the same time). The image at index 0 is the left camera; the image at index 1 is the right camera. Note that the left-right orientation of the peripheral can be detected automatically based on the direction from which the user inserts his or her hand into the field of view. Detection is enabled by the auto-orientation setting in the Leap Motion control panel.
Once you have an Image object, you can get the 8-bit brightness values from the Data buffer. The length of this buffer is Image.Width times Image.Height times Image.BytesPerPixel. The width and height of the image changes with the current operating mode of the controller, which can change from frame to frame. Note that in “robust mode,” the images are half as tall.
The following example gets the image list from a frame and copies the brightness values from the Data buffer to a bitmap:
Leap.Image image = new Leap.Image ();
Bitmap bitmap = new Bitmap (image.Width, image.Height, System.Drawing.Imaging.PixelFormat.Format8bppIndexed);
//set palette
ColorPalette grayscale = bitmap.Palette;
for (int i = 0; i < 256; i++) {
grayscale.Entries [i] = Color.FromArgb ((int)255, i, i, i);
}
bitmap.Palette = grayscale;
Rectangle lockArea = new Rectangle (0, 0, bitmap.Width, bitmap.Height);
BitmapData bitmapData = bitmap.LockBits (lockArea, ImageLockMode.WriteOnly, PixelFormat.Format8bppIndexed);
byte[] rawImageData = image.Data;
System.Runtime.InteropServices.Marshal.Copy (rawImageData, 0, bitmapData.Scan0, image.Width * image.Height);
bitmap.UnlockBits (bitmapData);
The calibration map can be used to correct image distortion due to lens curvature and other imperfections. The map is a 64x64 grid of points. Each point consists of two 32-bit values, so the buffer size is 128 times 64 times 4. You can get the calibration map buffer using the Image.Distortion function.
Each point in the buffer indicates where to find the corrected brightness value for the corresponding pixel in the raw image. Valid coordinates are normalized in the range [0..1]. Individual elements of the calibration map can have a value in the range [-0.6..2.3], but coordinates below zero or above 1 are invalid. Discard values outside the range [0..1] when using the calibration data.
To convert to pixel coordinates multiply by the width or height of the image. For pixels that lie in between the calibration grid points, you can interpolate between the nearest grid points. The camera lenses have a very large angle of view (roughly 150 degrees) and have a large amount of distortion. Because of this, not every point in the calibration grid maps to a valid pixel. The following rendering shows the lens correction data as color values. The left image shows the x values; the right side shows the y values.
The red values indicate map values that fall outside the image.
The size of the calibration map is subject to change in the future, so the Image class provides the grid dimensions with the DistortionWidth (actually twice the width to account for two values per grid point) and DistortionHeight functions. The length of the buffer containing the calibration data is DistortionWidth times DistortionHeight times 4 bytes.
The following example illustrates how to get the calibration data:
float[] distortionBuffer = image.Distortion; for (int d = 0; d < image.DistortionWidth * image.DistortionHeight; d += 2) { float dX = distortionBuffer [d]; float dY = distortionBuffer [d + 1]; if (!((dX < 0) || (dX > 1)) && !((dY < 0) || (dY > 1))) { //Use valid distortion point } }
Note that the calibration map seldom changes. It can change if you change devices, if you change the optimize for HMD setting, or if you recalibrate the device. In performance-critical applications, you can save a few CPU cycles by only updating distortion image textures occasionally. You should reread the distortion map when the users changes to a different device or the device changes the axis orientation. Both of these events trigger the |Listener.onDeviceChange|_ callback.
You can correct the raw image distortion in two ways:
The |Image_warp|_ and |Image_rectify|_ functions are the simpler method, but processing each pixel individually on the CPU is relatively slow. Use these functions if you are only correcting a few points, you don’t need to process data in real time, or when you cannot use GPU shaders. The Distortion buffer is designed to be used with a GPU shader program and can correct the entire raw image while maintaining a good application frame rate.
|Image.warp|_ takes a ray direction and returns the pixel coordinates into the raw image data that specify the brightness value recorded for that ray direction.
The following example uses the |Image_warp| function to rasterize the distortion-corrected image.
To create the image, the example creates a target bitmap and then, for each pixel, computes the direction of the ray of light that would strike the pixel through an ideal lens. The |Image_warp| function is then used to look up the correct brightness value.
//Draw the undistorted image using the RectilinearToPixel() function
int targetWidth = 400;
int targetHeight = 400;
Rectangle lockBounds = new Rectangle (0, 0, targetWidth, targetHeight);
Bitmap targetBitmap = new Bitmap (targetWidth, targetHeight, System.Drawing.Imaging.PixelFormat.Format24bppRgb);
//Iterate over target image pixels, converting xy to ray slope
for (float y = 0; y < targetHeight; y++) {
for (float x = 0; x < targetWidth; x++) {
//Normalize from pixel xy to range [0..1]
Vector input = new Vector (x / targetWidth, y / targetHeight, 0);
//Convert from normalized [0..1] to slope [-4..4]
input.x = (input.x - image.RayOffsetX) / image.RayScaleX;
input.y = (input.y - image.RayOffsetY) / image.RayScaleY;
//Use slope to get coordinates of point in image.Data containing the brightness for this target pixel
Vector pixel = image.RectilinearToPixel (Leap.Image.PerspectiveType.STEREO_LEFT, input);
if (pixel.x >= 0 && pixel.x < image.Width && pixel.y >= 0 && pixel.y < image.Height) {
int dataIndex = (int)(Math.Floor (pixel.y) * image.Width + Math.Floor (pixel.x)); //xy to buffer index
byte brightness = image.Data [dataIndex];
targetBitmap.SetPixel ((int)x, (int)y, Color.FromArgb (brightness, brightness, brightness));
} else {
targetBitmap.SetPixel ((int)x, (int)y, Color.Red); //Display invalid pixels as red
}
}
}
The example uses Bitmap.SetPixel to draw each image in the pixel. This is not a particularly efficient way to draw the undistorted image, however, and the frame rate is quite low. For a more efficient approach, use OpenGL and shader programs.
A more efficient way to correct the entire image is to use a GPU shader program. Pass the image data to a fragment shader as a normal texture and the distortion data as encoded textures. You can then texture a quad by decoding the distortion data and using that to look up the correct brightness value in the image texture.
TODO: example code
If a 32-bit-per-component texture format is not available on your target platform, you can use a separate texture for the x and y lookup values and encode the floating point values into multiple 8-bit color components. You then have to decode the values before using them to look up the raw brightness values.
A common method for encoding floating point data in a texture is to decompose the input value into four lower-precision values and then restore them in the shader. For example, you can encode a floating point number into a Color object that has four 8-bit components as follows:
Color encodeFloatRGBA(float input)
{
input = (input + 0.6)/2.3; //scale the input value to the range [0..1]
float r = input;
float g = input * 255;
float b = input * 255 * 255;
float a = input * 255 * 255 * 255;
r = r - (float)Math.floor(r);
g = g - (float)Math.floor(g);
b = b - (float)Math.floor(b);
a = a - (float)Math.floor(a);
return Color(r, g, b, a);
}
To recompose the value in the fragment shader, you look up the value in the texture and perform the reciprocal operation. To avoid losing too much precision, encode the x and y distortion values in separate textures. Once the distortion indices are sampled from the textures and decoded, you can look up the correct brightness value from the camera image texture.
uniform sampler2D texture;
uniform sampler2D vDistortion;
uniform sampler2D hDistortion;
varying vec2 distortionLookup;
varying vec4 vertColor;
varying vec4 vertTexCoord;
const vec4 decoderCoefficients = vec4(1.0, 1.0/255.0, 1.0/(255.0*255.0), 1.0/(255.0*255.0*255.0));
void main() {
vec4 vEncoded = texture2D(vDistortion, vertTexCoord.st);
vec4 hEncoded = texture2D(hDistortion, vertTexCoord.st);
float vIndex = dot(vEncoded, decoderCoefficients) * 2.3 - 0.6;
float hIndex = dot(hEncoded, decoderCoefficients) * 2.3 - 0.6;
if(vIndex >= 0.0 && vIndex <= 1.0
&& hIndex >= 0.0 && hIndex <= 1.0)
{
gl_FragColor = texture2D(texture, vec2(hIndex, vIndex)) * vertColor;
} else {
gl_FragColor = vec4(1.0, 0, 0, 1.0); //show invalid pixels as red
}
}
In situations where shaders are not feasible you may be able to correct image distortion faster using well-optimized bilinear interpolation than when using the |Image_warp|_ function. (As with any such optimization, you should verify your results with performance testing.)
Recall that the distortion map contains a 64x64 element grid. Imagine these grid elements evenly spread out over your target image (with element [0, 0] in the lower-lefthand corner and [64,64] in the upper-right). Each element contains a horizontal coordinate and a vertical coordinate identifying where in the sensor image data to find the recorded brightness for that pixel in the target image. To find the brightness values for pixels in between the distortion grid elements, you have to interpolate between the four nearest grid points.
The base algorithm for finding the distortion-corrected brightness for a given pixel in the target image is:
It is reasonably straightforward to draw representations of the Leap Motion tracking data over the camera image. If you have drawn the raw image data to a bitmap, you can find the pixel corresponding to a Leap Motion position using the |Image_warp|_ function.
Converting a position in Leap Motion coordinates to horizontal and vertical slopes (from the camera perspective) requires knowing how far the cameras are from the origin of the Leap Motion coordinate system. For the current peripheral version, the offset on the x axis is 20mm to either side. The cameras are on the x-axis, so there is no z offset. The slope is simply the distance from the camera in the image plane – the x-coordinate for the horizontal slope; the z-coordinate for the vertical slope – divided by the distance to the image plane, the z-coordinate. The following diagram illustrates the geometry for the horizontal slope:
The calculation is shown for the left camera; add the offset distance instead of subtracting for the right camera.
Once you know the ray slope values, you can get the pixel coordinates using |Image_warp|.
Note: The offset can be different for different form factors of the Leap Motion controller, but there is currently no way to get this value from the API.
The following example draws bright pixels at the finger tips in the image.
float cameraOffset = 20; //x-axis offset in millimeters
foreach (Hand hand in frame.Hands) {
foreach (Finger finger in hand.Fingers) {
Vector tip = finger.TipPosition;
float hSlope = -(tip.x - cameraOffset) / tip.y; //For left camera
float vSlope = tip.z / tip.y;
Vector pixel = image.RectilinearToPixel (Leap.Image.PerspectiveType.STEREO_LEFT, new Vector (hSlope, vSlope, 0));
//Draw tip at pixel
}
}
If you have rendered the corrected image data, then correlating tracking data to the image data depends on how you rendered the image. For 3D scenes, this is a matter of using a consistent scale and correct placement of the textured quad showing the camera image. For other types of rendering, you must convert the ray slopes representing a Leap Motion position to a target image pixel according to the way that you corrected the image data.
The following example displays bright pixels at the finger tip positions on the distortion corrected image rendered in the example above.
int targetWidth = 400;
int targetHeight = 400;
float cameraXOffset = 20; //millimeters
foreach(Hand hand in frame.Hands){
foreach (Finger finger in hand.Fingers) {
Vector tip = finger.TipPosition;
float hSlope = -(tip.x - cameraXOffset) / tip.y; //For left camera
float vSlope = tip.z / tip.y;
Vector ray = new Vector (hSlope * image.RayScaleX + image.RayOffsetX,
vSlope * image.RayScaleY + image.RayOffsetY, 0);
//Pixel coordinates from [0..1] to [0..width/height]
Vector pixel = new Vector (ray.x * targetWidth, ray.y * targetHeight, 0);
}
}
Get the direction to an image feature with the |Image.rectify|_ function. |Image.rectify| returns a vector containing the horizontal and vertical slopes (as defined from the camera point of view) given the pixel coordinates in the raw image data.
The following example illustrates how to get the use the slope data for a pair of pixels that represent the same point on an image feature. If you can match two pixels in the stereo image pair with sufficient accuracy, you can triangulate the 3D position using the set of slope values from the two camera images.
Vector left_camera_pixel = image.RectilinearToPixel(Leap.Image.PerspectiveType.STEREO_LEFT,
new Vector(horizontal_slope_left,
Vector right_camera_pixel = image.RectilinearToPixel(Leap.Image.PerspectiveType.STEREO_RIGHT,
new Vector(horizontal_slope_right,
Vector slopes_left = image.PixelToRectilinear(Leap.Image.PerspectiveType.STEREO_LEFT,
left_camera_pixel);
Vector slopes_right = image.PixelToRectilinear(Leap.Image.PerspectiveType.STEREO_RIGHT,
right_camera_pixel);
//Do the triangulation from the rectify() slopes
float cameraZ = 40/(slopes_right.x - slopes_left.x);
float cameraY = cameraZ * slopes_right.y;
float cameraX = cameraZ * slopes_right.x - 20;
Vector position = new Vector(cameraX, -cameraZ, cameraY);
The Leap Motion service/daemon software provides a mode that optimizes tracking when the Leap Motion hardware is attached to a head-mounted display. In this mode, the Leap Motion software expects to view hands from the top rather than the bottom. When ambiguity exists whether the palm of a hand is facing toward or away from the Leap Motion sensors, setting this mode makes it more likely that the software will initialize the hand model so that it is facing away from the sensors. Thus this mode is good for mounting the Leap Motion device on the face of a head-mounted display rig.
To turn on the mode in your application, enable the optimize HMD policy:
controller.SetPolicy(Controller.PolicyFlag.POLICY_OPTIMIZE_HMD);
The policy is always denied for hardware that cannot be mounted on an HMD, such as those embedded in laptops or keyboards.