daniel shiffman

Getting Started with Kinect and Processing

So, you want to use the Kinect in Processing. Great. This page will serve to document the current state of my Processing Kinect library, with some tips and info.

The current state of affairs

Since the kinect launched in November 2010, there have been several models released. Here's a quick list of what is out there and what is supported in Processing for Mac OS X.

Now, before you proceed, you could also consider using the SimpleOpenNI library and read Greg Borenstein’s Making Things See book. OpenNI has lots of features (skeleton tracking, gesture recognition, etc.) that are not available in this library. Unfortunately, OpenNI was recently purchased by Apple and appears to be shutting down. It's unclear what the future will be of OpenNI and SimpleOpenNI.

I’m ready to get started right now

What hardware do I need?

First you need a “stand-alone” kinect (model 1414 only for now!). You do not need to buy an Xbox. If you get the stand-alone kinect listed below, it will come with a USB adapter and power supply.

Standalone Kinect Sensor

If you have a previous kinect that came with an XBox, it will not include the USB adapter. You’ll need to purchase this separately:

Kinect Sensor Power Supply

Um, what is Processing?

I’m going to assume you are familiar with Processing, but just in case you are not, I suggest checking out: processing.org (Processing is an open source programming language and environment for people who want to create images, animations, and interactions. Initially developed to serve as a software sketchbook and to teach fundamentals of computer programming within a visual context, Processing also has evolved into a tool for generating finished professional work. Today, there are tens of thousands of students, artists, designers, researchers, and hobbyists who use Processing for learning, prototyping, and production.)

What if I don’t want to use Processing?

If you are comfortable with C++ I suggest you consider using openFrameworks or Cinder with the Kinect. At the time of this writing, these environments have some features I haven’t implemented yet and you also get a C++ speed advantage when processing the depth data, etc.:

Kinect CinderBlock

More resources from: The OpenKinect Project

I’ve got Processing, how do I download and install the library?

By default Processing will create a “sketchbook” folder in your Documents directory, i.e. on my machine it’s:


The easiest way to install the library is to go to SKETCH-->IMPORT LIBRARIES-->ADD LIBRARY and search for "Kinect". A button will appear labeled "install".

If you want to install it manually, find the libraries folder (create it if it isn't already there). On my machine it's:


Then go and download the most recent release and extract it in the libraries folder. Restart Processing, open up one of the examples in the examples folder and you are good to go!

What code do I write?

To get started using the library, you need to include the proper import statements at the top of your code:

import org.openkinect.*;
import org.openkinect.processing.*;

As well as a reference to a “Kinect” object, i.e.

// Kinect Library object
Kinect kinect;

Then in setup() you can initialize that kinect object:

kinect = new Kinect(this);

Once you’ve done this you can begin to access data from the kinect sensor.

Currently, the library makes data available to you in four ways:

  1. RGB image from the kinect camera as a PImage.
  2. Grayscale image from the IR camera as a PImage
  3. Grayscale image with each pixel’s brightness mapped to depth (brighter = closer).
  4. Raw depth data (11 bit numbers between 0 and 2048) as an int[] array

Let’s look at these one at a time. If you want to use the Kinect just like a regular old webcam, you can request that the RGB image is captured:


Then you simply ask for the image as a PImage!

PImage img = kinect.getVideoImage();

Alternatively, you can enable the IR image:


Currently, you cannot have both the RGB image and the IR image. They are both passed back via getVideoImage() so whichever one was most recently enabled is the one you will get.

Now, if you want the depth image, you can:


and request the grayscale image:

PImage img = kinect.getDepthImage();

As well as the raw depth data:

int[] depth = kinect.getRawDepth();

If you are looking at the raw depth data only, you can turn off the library’s behind the scenes depth image processing to make it slightly more efficient:


Finally, you can also adjust the camera angle with the tilt() function, i.e.:

float deg = 15;

So, there you have it, here are all the useful functions you might need to use the Processing kinect library:

  1. enableRGB(boolean) — turn on or off the RGB camera image
  2. enableIR(boolean) — turn on or off the IR camera image
  3. enableDepth(boolean) — turn on or off the depth tracking
  4. processDepthImage(boolean) — turn on or off the depth image processing
  5. PImage getVideoImage() — grab the RGB or IR video image
  6. PImage getDepthImage() — grab the grayscale depth map image
  7. int[] getRawDepth() — grab the raw depth data
  8. tilt(float) — adjust the camera angle (between 0 and 30 degrees)

For everything else, you can also take a look at the javadoc reference.

So now what?

So far, I only have three basic examples:

Display RGB, IR, and Depth Images



This example does nothing but use all of the above listed functions to display the data from the kinect sensor.

Point Cloud

Code: PointCloud

Here, we’re doing something a bit fancier. Number one, we’re using the 3D capabilities of Processing to draw points in space. You’ll want to familiarize yourself with translate(), rotate(), pushMatrix(), popMatrix(). This tutorial is also a good place to start. In addition, the example uses a PVector to describe a point in 3D space. More here: PVector tutorial.

The real work of this example, however, doesn’t come from me at all. The raw depth values from the kinect are not directly proportional to physical depth. Rather, they scale with the inverse of the depth according to this formula:

depthInMeters = 1.0 / (rawDepth * -0.0030711016 + 3.3309495161);

Rather than do this calculation all the time, we can precompute all of these values in a lookup table since there are only 2048 depth values.

float[] depthLookUp = new float[2048];
for (int i = 0; i < depthLookUp.length; i++) {
  depthLookUp[i] = rawDepthToMeters(i);

float rawDepthToMeters(int depthValue) {
  if (depthValue < 2047) {
    return (float)(1.0 / ((double)(depthValue) * -0.0030711016 + 3.3309495161));
  return 0.0f;

Thanks to Matthew Fisher for the above formula. (Note: for the results to be more accurate, you would need to calibrate your specific kinect device, but the formula is close enough for me so I'm sticking with it for now. More about calibration in a moment.)

Finally, we can draw some points based on the depth values in meters:

for(int x = 0; x < w; x += skip) {
    for(int y = 0; y < h; y += skip) {
      int offset = x+y*w;

      // Convert kinect data to world xyz coordinate
      int rawDepth = depth[offset];
      PVector v = depthToWorld(x,y,rawDepth);

      // Scale up by 200
      float factor = 200;
      // Draw a point

Average Point Tracking

The real magic of the kinect lies in its computer vision capabilities. With depth information, you can do all sorts of fun things like say: "the background is anything beyond 5 feet. Ignore it!" Without depth, background removal involves all sorts of painstaking pixel comparisons. As a quick demonstration of this idea, here is a very basic example that compute the average xy location of any pixels in front of a given depth threshold.

Source: AveragePointTracking

In this example, we declare two variables to add up all the appropriate x's and y's and one variable to keep track of how many there are.

float sumX = 0;
float sumY = 0;
float count = 0;

Then, whenever we find a given point that complies with our threshold, we add the x and y to the sum:

if (rawDepth < threshold) {
    sumX += x;
    sumY += y;

When we're done, we calculate the average and draw a point!

if (count != 0) {
  float avgX = sumX/count;
  float avgY = sumY/count;

Why don’t the RGB images and depth values correspond properly?

Unfortunately, b/c the RGB camera and the IR camera are not physically located in the same spot, we have a stereo vision problem. Pixel XY in one image is not the same XY in an image from a camera an inch to the right. I’m hoping to stretch my brain to try to understand this better and work out some examples that calibrate the data in Processing. Stay tuned!

If you are interested in more (and software that will do this very job!) check out Nicolas Burrus’ amazing work:

Theory on depth/color calibration and registration
version 0.3 of RGBDemo

What’s missing?

Lots! Open a github issue if you want to add an item to my to do list!



1. What are there shadows in the depth image?

Kinect Shadow diagram

2. What is the range of depth that the kinect can see?

~0.7–6 meters or 2.3–20 feet. Note you will get black pixels (or raw depth value of 2048) at both elements that are too far away and too close.

comments powered by Disqus