daniel shiffman

Getting Started with Kinect and Processing

Open Kinect for Processing Demo from shiffman on Vimeo.

Kinect and Processing

The Microsoft Kinect sensor is a peripheral device (designed for XBox and windows PCs) that functions much like a webcam. However, in addition to providing an RGB image, it also provides a depth map. Meaning for every pixel seen by the sensor, the Kinect measures distance from the sensor. This makes a variety of computer vision problems like background removal, blob detection, and more easy and fun!

The Kinect sensor itself only measures color and depth. However, once that information is on your computer, lots more can be done like "skeleton" tracking (i.e. detecting a model of a person and tracking his/her movements). To do skeleton tracking you'll need to use Thomas Lengling's windows-only Kinect v2 processing libray. However, if you're on a Mac and all you want is raw data from the Kinect, you are in luck! This library uses libfreenect and libfreenect2 open source drivers to access that data for Mac OS X (windows support coming soon).

(At the time of this writing, kinect v2 support as well as other issues are in progress. Chime in on github if you’d like to help!)

What hardware do I need?

First you need a “stand-alone” kinect. You do not need to buy an Xbox.

Some additional notes about different models:

SimpleOpenNI

You could also consider using the SimpleOpenNI library and read Greg Borenstein’s Making Things See book. OpenNI has features (skeleton tracking, gesture recognition, etc.) that are not available in this library. Unfortunately, OpenNI was recently purchased by Apple and, while I thought it was shut, down there appear to be some efforts to revive it!. It's unclear what the future will be of OpenNI and SimpleOpenNI.

I’m ready to get started right now

The new updated version of the library is in progress. Until it is available via the Processing contributions manager, you'll need to get it here and install the library manually.

The easiest way to install the library is with the contributions manager Sketch → Import Libraries → Add library and search for "Kinect". A button will appear labeled "install". If you want to install it manually download the most recent release and extract it in the libraries folder. Restart Processing, open up one of the examples in the examples folder and you are good to go!

What is Processing?

Processing is an open source programming language and environment for people who want to create images, animations, and interactions. Initially developed to serve as a software sketchbook and to teach fundamentals of computer programming within a visual context, Processing also has evolved into a tool for generating finished professional work. Today, there are tens of thousands of students, artists, designers, researchers, and hobbyists who use Processing for learning, prototyping, and production.

What if I don’t want to use Processing?

If you are comfortable with C++ I suggest you consider using openFrameworks or Cinder with the Kinect. These environments have some additional features and you also may get a C++ speed advantage when processing the depth data, etc.:

What code do I write?

First thing is to include the proper import statements at the top of your code:

import org.openkinect.processing.*;

As well as a reference to a Kinect object, i.e.

Kinect kinect;

Then in setup() you can initialize that kinect object:

void setup() {
  kinect = new Kinect(this);
}

Once you’ve done this you can begin to access data from the kinect sensor. Currently, the library makes data available to you in five ways:

Let’s look at these one at a time. If you want to use the Kinect just like a regular old webcam, you can request that the RGB image is captured:

kinect.startVideo();

Then you simply ask for the image as a PImage!

PImage img = kinect.getVideoImage(); 
image(img, 0, 0);

You can simply ask for this image in draw(), however, if you can also use videoEvent() to know when a new image is available.

void videoEvent(Kinect k) {
  // There has been a video event!
}

If you want to IR image:

kinect.setIR(true);

You cannot get both the RGB image and the IR image. They are both passed back via getVideoImage() so whichever one was most recently enabled is the one you will get.

Now, if you want the depth image, you can:

kinect.startDepth();

and request the grayscale image:

PImage img = kinect.getDepthImage();
image(img, 0, 0);

As well as the raw depth data:

int[] depth = kinect.getRawDepth();

For the color depth image, use kinect.setColorDepth(true);. And just like with the video image, there's a depth event you can access if necessary.

void depthEvent(Kinect k) {
  // There has been a depth event!
}

Finally, you can also adjust the camera angle with the tilt() function.

float angle = kinect.getTilt();
angle = angle + 1;
kinect.tilt(angle);

So, there you have it, here are all the useful functions you might need to use the Processing kinect library:

For everything else, you can also take a look at the javadoc reference.

Examples

There are four basic examples:

Display RGB, IR, and Depth Images

Code:RGBDepthTest

This example uses all of the above listed functions to display the data from the kinect sensor.

Point Cloud

Code: PointCloud

Here, we’re doing something a bit fancier. Number one, we’re using the 3D capabilities of Processing to draw points in space. You’ll want to familiarize yourself with translate(), rotate(), pushMatrix(), popMatrix(). This tutorial is also a good place to start. In addition, the example uses a PVector to describe a point in 3D space. More here: PVector tutorial.

The real work of this example, however, doesn’t come from me at all. The raw depth values from the kinect are not directly proportional to physical depth. Rather, they scale with the inverse of the depth according to this formula:

depthInMeters = 1.0 / (rawDepth * -0.0030711016 + 3.3309495161);

Rather than do this calculation all the time, we can precompute all of these values in a lookup table since there are only 2048 depth values.

float[] depthLookUp = new float[2048];
for (int i = 0; i < depthLookUp.length; i++) {
  depthLookUp[i] = rawDepthToMeters(i);
}

float rawDepthToMeters(int depthValue) {
  if (depthValue < 2047) {
    return (float)(1.0 / ((double)(depthValue) * -0.0030711016 + 3.3309495161));
  }
  return 0.0f;
}

Thanks to Matthew Fisher for the above formula. (Note: for the results to be more accurate, you would need to calibrate your specific kinect device, but the formula is close enough for me so I'm sticking with it for now. More about calibration in a moment.)

Finally, we can draw some points based on the depth values in meters:

for(int x = 0; x < w; x += skip) {
    for(int y = 0; y < h; y += skip) {
      int offset = x + y * kinect.width;

      // Convert kinect data to world xyz coordinate
      int rawDepth = depth[offset];
      PVector v = depthToWorld(x, y, rawDepth);

      stroke(255);
      pushMatrix();
      // Scale up by 200
      float factor = 200;
      translate(v.x * factor, v.y * factor, factor-v.z * factor);
      // Draw a point
      point(0,0);
      popMatrix();
    }
  }

Average Point Tracking

The real magic of the kinect lies in its computer vision capabilities. With depth information, you can do all sorts of fun things like say: "the background is anything beyond 5 feet. Ignore it!" Without depth, background removal involves all sorts of painstaking pixel comparisons. As a quick demonstration of this idea, here is a very basic example that compute the average xy location of any pixels in front of a given depth threshold.

Source: AveragePointTracking

In this example, I declare two variables to add up all the appropriate x's and y's and one variable to keep track of how many there are.

float sumX = 0;
float sumY = 0;
float count = 0;

Then, whenever we find a given point that complies with our threshold, I add the x and y to the sum:

if (rawDepth < threshold) {
    sumX += x;
    sumY += y;
    count++;
  }

When we're done, we calculate the average and draw a point!

if (count != 0) {
  float avgX = sumX / count;
  float avgY = sumY / count;
  fill(255, 0, 0);
  ellipse(avgX, avgY, 16, 16);
}

Why don’t the RGB images and depth values correspond properly?

Unfortunately, b/c the RGB camera and the IR camera are not physically located in the same spot, we have a stereo vision problem. Pixel XY in one image is not the same XY in an image from a camera an inch to the right. Tools for calibrating these two images are on my to-do list. For more: Theory on depth/color calibration and registration.

What’s missing?

FAQ

  1. What are there shadows in the depth image? Kinect Shadow diagram
  2. What is the range of depth that the kinect can see? ~0.7–6 meters or 2.3–20 feet. Note you will get black pixels (or raw depth value of 2048) at both elements that are too far away and too close.
comments powered by Disqus