The Microsoft Kinect sensor is a peripheral device (designed for XBox and windows PCs) that functions much like a webcam. However, in addition to providing an RGB image, it also provides a depth map. Meaning for every pixel seen by the sensor, the Kinect measures distance from the sensor. This makes a variety of computer vision problems like background removal, blob detection, and more easy and fun!
The Kinect sensor itself only measures color and depth. However, once that information is on your computer, lots more can be done like “skeleton” tracking (i.e. detecting a model of a person and tracking his/her movements). To do skeleton tracking you’ll need to use Thomas Lengling’s windows-only Kinect v2 processing libray. However, if you’re on a Mac and all you want is raw data from the Kinect, you are in luck! This library uses libfreenect and libfreenect2 open source drivers to access that data for Mac OS X (windows support coming soon).
First you need a “stand-alone” kinect. You do not need to buy an Xbox.
You could also consider using the SimpleOpenNI library and read Greg Borenstein’s Making Things See book. OpenNI has features (skeleton tracking, gesture recognition, etc.) that are not available in this library. Unfortunately, OpenNI was recently purchased by Apple and, while I thought it was shut, down there appear to be some efforts to revive it!. It’s unclear what the future will be of OpenNI and SimpleOpenNI.
The easiest way to install the library is with the Processing Contributions Manager Sketch → Import Libraries → Add library and search for “Kinect”. A button will appear labeled “install”. If you want to install it manually download the most recent release and extract it in the libraries folder. Restart Processing, open up one of the examples in the examples folder and you are good to go!
Processing is an open source programming language and environment for people who want to create images, animations, and interactions. Initially developed to serve as a software sketchbook and to teach fundamentals of computer programming within a visual context, Processing also has evolved into a tool for generating finished professional work. Today, there are tens of thousands of students, artists, designers, researchers, and hobbyists who use Processing for learning, prototyping, and production.
If you are comfortable with C++ I suggest you consider using openFrameworks or Cinder with the Kinect. These environments have some additional features and you also may get a C++ speed advantage when processing the depth data, etc.:
First thing is to include the proper import statements at the top of your code:
As well as a reference to a
Kinect object, i.e.
setup() you can initialize that kinect object:
If you are using a Kinect v2, use a Kinect2 object instead.
Once you’ve done this you can begin to access data from the kinect sensor. Currently, the library makes data available to you in five ways:
PImage (RGB)from the kinect video camera.
PImage (grayscale)from the kinect IR camera.
PImage (grayscale)with each pixel’s brightness mapped to depth (brighter = closer).
PImage (RGB)with each pixel’s hue mapped to depth.
int arraywith raw depth data (11 bit numbers between 0 and 2048).
Let’s look at these one at a time. If you want to use the Kinect just like a regular old webcam, you can access the video image as a PImage!
You can simply ask for this image in
draw(), however, if you can also use
videoEvent() to know when a new image is available.
If you want the IR image:
With kinect v1 cannot get both the video image and the IR image. They are both passed back via getVideoImage() so whichever one was most recently enabled is the one you will get. However, with the Kinect v2, they are both available as separate methods:
Now, if you want the depth image, you can request the grayscale image:
As well as the raw depth data:
For the kinect v1, the raw depth values range between 0 and 2048, for the kinect v2 the range is between 0 and 4500.
For the color depth image, use
kinect.enableColorDepth(true);. And just like with the video image, there’s a depth event you can access if necessary.
Unfortunately, b/c the RGB camera and the IR camera are not physically located in the same spot, there is a stereo vision problem. Pixel XY in one image is not the same XY in an image from a camera an inch to the right. The Kinect v2 offers what’s called a “registered” image which aligns all the depth values with the RGB camera ones. This can be accessed as follows:
Finally, for kinect v1 (but not v2), you can also adjust the camera angle with the
So, there you have it, here are all the useful functions you might need to use the Processing kinect library:
initDevice()— start everything (video, depth, IR)
activateDevice(int)- activate a specific device when multiple devices are connect
initVideo()— start video only
enableIR(boolean)— turn on or off the IR camera image (v1 only)
initDepth()— start depth only
enableColorDepth(boolean)— turn on or off the depth values as color image
enableMirror(boolean)— mirror the image and depth data (v1 only)
PImage getVideoImage()— grab the RGB (or IR for v1) video image
PImage getIrImage()— grab the IR image (v2 only)
PImage getDepthImage()— grab the depth map image
PImage getRegisteredImage()— grab the registered depth image (v2 only)
int getRawDepth()— grab the raw depth data
float getTilt()— get the current sensor angle (between 0 and 30 degrees) (v1 only)
setTilt(float)— adjust the sensor angle (between 0 and 30 degrees) (v1 only)
For everything else, you can also take a look at the javadoc reference.
There are four basic examples for both v1 and v2.
Code for v1:RGBDepthTest
Code for v2:RGBDepthTest2
This example uses all of the above listed functions to display the data from the kinect sensor.
Both v1 and v2 has multiple kinect support.
Code for v1:MultiKinect
Code for v2:MultiKinect2
Code for v1: PointCloud
Code for v2: PointCloud
Here, we’re doing something a bit fancier. Number one, we’re using the 3D capabilities of Processing to draw points in space. You’ll want to familiarize yourself with translate(), rotate(), pushMatrix(), popMatrix(). This tutorial is also a good place to start. In addition, the example uses a PVector to describe a point in 3D space. More here: PVector tutorial.
The real work of this example, however, doesn’t come from me at all. The raw depth values from the kinect are not directly proportional to physical depth. Rather, they scale with the inverse of the depth according to this formula:
Rather than do this calculation all the time, we can precompute all of these values in a lookup table since there are only 2048 depth values.
Thanks to Matthew Fisher for the above formula. (Note: for the results to be more accurate, you would need to calibrate your specific kinect device, but the formula is close enough for me so I’m sticking with it for now. More about calibration in a moment.)
Finally, we can draw some points based on the depth values in meters:
The real magic of the kinect lies in its computer vision capabilities. With depth information, you can do all sorts of fun things like say: “the background is anything beyond 5 feet. Ignore it!” Without depth, background removal involves all sorts of painstaking pixel comparisons. As a quick demonstration of this idea, here is a very basic example that compute the average xy location of any pixels in front of a given depth threshold.
Source for v1: AveragePointTracking
Source for v2: AveragePointTracking2
In this example, I declare two variables to add up all the appropriate x’s and y’s and one variable to keep track of how many there are.
Then, whenever we find a given point that complies with our threshold, I add the x and y to the sum:
When we’re done, we calculate the average and draw a point!