The heart of the Google AIY Voice setup is the Voice Hat, an add-on board for the Raspberry Pi Zero. The Voice Hat doesn’t actually have any onboard speech processing. That is handled by Google’s cloud (or another service like Amazon’s Alexa). The Hat primarily provides a decent speaker amplifier and a secondary board with a stereo microphone.
The original Voice Hat, using a full-size Pi, broke out a number of input-output pins with space to easily add servos as well as drive some higher loads (up to 500mA). This makes it easy to give your voice-controlled device some motion or other physical interface. The newer release uses the Pi Zero, so it’s a little more limited — but it does come with the Pi Zero rather than having to supply your own.
Like the Voice kit, the AIY Vision kit also has an add-on board for the Raspberry Pi Zero, a cardboard enclosure, and an arcade button. But this board, the VisionBonnet, has some real power to do onboard image analysis without the cloud. It uses an Intel Movidius MA2450 vision chip along with the Raspberry Pi Camera Module.
The MA2450 is designed for low-power environments like mobile phones and helps the Pi deal with the vast amount of data generated by the camera’s live video stream, allowing this small device to process the input and recognize faces and other objects quickly.
Google’s example code provides pre-trained models for faces, expressions, and objects like cats and dogs. You can even train your own models, though not on the device itself. For that, you’ll need to dive into a deep learning environment like Google’s TensorFlow. The process of classifying what an object is, from thousands of images, is too intense for such a small device to do in a reasonable amount of time. However, the raw image processing of the device is still powerful and really useful if you want to make a responsive vision-based interface without an expensive computer and graphics card attached.
The form factor of the Pi Zero doesn’t allow for the additional breakouts found on the Voice Hat, like the transistors to drive high loads, but it does break out four of the Pi’s I/O pins, power, and ground so you can connect additional inputs and outputs. You may also eventually want to create a sturdier enclosure, as the included cardboard setup can wear out after a few reassemblies.
The Matrix Voice is the most capable of the three boards with an 8-channel microphone array and a chip for audio processing. This is the second board by Matrix Labs, preceded by the more expensive but more full-featured Matrix Creator.
The Matrix boards use a Field Programmable Gate Array (FPGA) to process the raw audio input from the 8-channel microphone array by performing tasks such as noise cancellation and beamforming. Matrix has preprogrammed the FPGA with many of the needed audio algorithms but you are free to tinker with them. Like the AIY Voice kit, the speech recognition and natural language processing used to turn users’ speech into useable commands is handled by cloud services like Google or Amazon.
The Matrix Voice supports a few more features than the AIY Voice, with both speaker output and headphone jack, an LED ring, and additional I/O pins. If you get the version with the ESP32 chip, you can operate the board with or without a Raspberry Pi.
Matrix Labs sees their boards as part of a platform of IoT devices and apps, and they’ve even provided a repository so you can easily add other people’s apps to your Matrix-enabled Pi.
• • •
Using a voice assistant such as Google Assistant or Amazon Alexa with either the AIY Voice or the Matrix Voice requires some significant setup with those services. You’ll need to answer questions about the app you’re creating, as well as create tokens and credentials that connect your device, app, and the different cloud services. This process is documented but not particularly straightforward.
In addition, there’s some setup required on the Pi itself to configure the hardware and install the development environments and examples. It’s useful to have some experience with Linux and/or the Raspberry Pi environment if the build doesn’t go completely smoothly.
The big advantage of all these boards is the preprocessing they do with raw audio and video input. And with the audio boards, many of the AI features of the cloud services like Google Assistant and Amazon Alexa can be accessed from a simple Raspberry Pi computer. So why not give your next project some smarts?