When Microsoft released the Kinect, the first widely available depth camera, artists and makers quickly flocked to it, discovering how to access the camera’s depth data and ability to capture human motion. The result was a slew of new projects that capitalized on an interface that didn’t require a screen, buttons or a mouse.
After the Kinect’s initial success experimentation by the maker community trailed off as many of the open source tools surrounding the device stopped being developed or were rolled into proprietary technology like the iPhones or Microsoft’s Hololens.
Luxonis‘ release of the Oak-D (short for “OpenCV AI Kit”) might be the successor to the Kinect that the maker community needs to once again create robots, interfaces and art that can easily interpret their surroundings. The camera, which is based on open source tools like OpenCV, capitalizes on many of the advances in computer vision since the Kinect was released, allowing user to integrate the various machine learning models that not only help developers identify poses and location but identify types of objects, emotions and other computer vision tasks.
The Oak-D consists of three cameras: two dedicated to stereo vision that help identify the exact location of objects in 3D space, and a third that provides a 4K color image. Unlike other depth cameras that might use structured light (projecting a grid on to the subject to see how it deforms) or time of flight (measure the time it takes light to travel to and from the subject) the Oak-D works much like our brain does by measuring the difference between the offset from our eyes.
What makes the Oak-D a really flexible tool is the Intel MyriadX chip that synthesizes these images and provides machine learning inference processing. Beyond just depth, the MyriadX allows the Oak-D to recognize objects, facial expressions, or whatever machine learning model the user chooses. It also means that while you will still need a computer to run the camera the MyriadX can handle most of the heavy lifting. Instead of a desktop with a high performance graphics card you can run it with a Raspberry Pi.
Testing it out
I tried the Oak-D on a Mac laptop and Raspberry Pi 3. Apart from some minor difference in installation the development process is relatively similar between devices. Both systems performed at about the same frames per second, which is to be expected given it is the MyriadX chip that is doing most of the processing. The flexibility of using multiple systems for development makes for an easier workflow, where you could create and tweak your code on a laptop or desktop and move to a single board computer like the Pi when you are ready to deploy. Luxonis provides walkthroughs for getting started on Mac, Pi, other Linux systems, as well as Windows machines.
Development on the Oak-D is done through Python and Luxonis’s interface library DepthAI. Python is the language of choice for many machine learning developers, making it easier to integrate existing machine learning tools and models into your project. If you are new to Python development it may take a while to get used to using virtual environments and managing packages but the syntax is fairly easy. It would also be useful to have some understanding on how machine learning models work and are developed but you can get started by trying out the various models prepared by Luxonis for the Oak-D. Some familiarity with the Linux command line will also be useful for setting up your Python environment and working with the camera.
Luxonis’s documentation for the camera is straightforward but a bit disorganized. Once I found their Python API I was able to get started and explore the various features easily. Their DepthAI demo project is flexible enough that you can try out most of the camera’s features and a host of machine learning models for various tasks without writing any code. You will have to parse through some command line modifiers to take full advantage of this script however. To make any project truly your own you’ll want to dive into the Python code and tweak parameters and eventually write your own scripts.
Luxonis’s demo projects provide a number of machine learning models to test out the camera, but you can also get more through OpenVINO project and its Open Model Zoo. OpenVINO helps optimize models to run on the MyriadX chip efficiently. I was able to run face detection, emotion recognition, object recognition, as well as several human pose recognition models all of which were pretty accurate. There was a second or so lag in recognizing poses when I was moving faster.
As with all computer vision models, these may have their blindspots. The original Kinect was not great at identifying people of different body types and many machine learning models have shown bias against people of different races and skin tones. The Oak-D doesn’t solve this problem, but by letting you decide what runs on the device you may have more chance to change this than on more closed devices.
With this flexibility also comes more work at getting the device to do your bidding. For those newer to Python, the command line, or machine learning, it may take a bit longer to get up and running. But if you are ready for the challenge this device is a powerful and flexible way of seeing the world around you.
Type: AI accelerated camera
Clock Speed: 1.43GHz CPU, 921MHz GPU
Processor: Intel Myriad X visual processing unit, 700 MHz
- 2 OV9282 sensors, 1280 x 720px, Fixed focus 19.6cm–infinity
- 1 IMX378 sensor 4K, 60Hz video, 4056 x 3040px, autofocus 8 cm–infinity
Memory: 2 GB 64-bit LPDDR4
Input Voltage: 5V 3amps