Yesterday, Microsoft launched a new product called Kinect. It is an add-on for the very popular game console Xbox 360 and allows for the user itself to be the controller. No more fiddling with weirdly shaped controllers. Just step in front of your television and you can control games with your own gestures (and your own words).
It is truly revolutionary. Until now, vision systems that can interpret the world around them in three dimensions could only be found in laboratories and factories. It is the first time in history that 3D vision technology has reached the living room. And since Microsoft plans to spend $500 million on marketing Kinect, I am sure a lot of people will know about it soon.
How does it work?
The Kinect system consists of one projector and two cameras (and 4 microphones that I will not discuss). The projector sends out infra-red light that is invisible to the human eye. The system measures the amount of time it takes for the infrared light rays to return from their journey from the projector, bounced back by the body of the user, and into the 320 x 240 pixel infrared camera. From that “time-of-flight” information, Kinect can calculate the distances for 320 x 240 = 76.800 points in the environment. The other one is a standard 640 x 480 pixel camera.
The combination of distances and colors allows the system to easily distinguish the players from the background. The information is detailed enough to identify the different body parts, and to interpret the movements and gestures of the player.
How does it feel?
I only have second hand information about the experience of using Kinect. What I understand from the reviews is this; it is even more engaging than playing with the Nintendo Wii or the Sony Playstation Move. With those other systems only your hand that is holding the remote influences the game. With Kinect, your whole body affects the game play. It is dubbed a “natural user interface”.
One complaint about Kinect is that it is not entirely robust yet. Changing light conditions, for example a beam of sunlight, can disrupt performance. Another problem is the delay between the movement of your body and the response on screen that can be quite annoying. Not so natural after all.
Why is Microsoft first?
They are not. The principle has been around for ever. It is the same as sonar; the way bats listen to the reflection of their own sounds to navigate. Infrared time-of-flight cameras became available with the advent of semiconductor systems that were fast enough to measure these small time intervals. There exists a whole industry called machine vision that sells these systems, among other vision technologies like laser triangulation and stereo cameras. That community is gathering next week in Stuttgart at VISION 2010 for the 23rd time. You will not find many consumer companies there.
Microsoft is first in putting this technology in the hands of consumers, though they have not developed their own 3D sensing systems. Their supplier is the Israeli company Prime Sense. In order to obstruct competitors and gather a pile of patents, Microsoft acquired two companies that have developed time-of-flight systems: 3DV Systems, also from Israel, and Canesta based in Silicon Valley.
Thanks Microsoft for selling this cool gadget! And thanks for so thoroughly spreading the word with your big marketing machine. It will make things so much easier for me to explain; I develop cameras that understand what they see in 3D. No add-ons required.