Machine learning (ML) is a growing field, gaining popularity in academia, industry, and among makers. We will take a look at some of the available tools to help make machine learning easier, but first, let’s review some of the terms commonly used in machine learning.
John McCarthy provides a definition of artificial intelligence (AI) in his 2007 Stanford paper, “What is Artificial Intelligence?” In it, he says AI “is the science and engineering of making intelligent machines, especially intelligent computer programs.” This definition is extremely broad, as McCarthy defines intelligence as “the computational part of the ability to achieve goals in the world.” As a result, any program that achieves some goal can easily be classified as artificial intelligence.
In her article “Machine Learning on Microcontrollers” (Make: Vol. 75), Helen Leigh gives us a great definition of machine learning: “With traditional programming, you explicitly
tell a computer what it needs to do using code, but with machine learning the computer finds
its own solution to a problem based on examples you show it.” In practice, this means collecting data or finding a pre-built dataset to train a mathematical model, much like training a child to recognize the difference between a dog and a cat from various photos.
The trained model should make successful predictions or classifications when presented with unseen data, in a process called inference. This process is similar to showing the child a new photo of a cat and seeing if they guess correctly.
In practice, training a machine learning model requires more complex math and computing power than inference does. As a result, we often see training occur on large desktops or servers, which gives us the option of performing inference on small embedded devices using the newly trained model. Training on a microcontroller is theoretically possible, but most microcontrollers don’t have the memory and computing power necessary to perform the required calculations.
From this, we can conclude that machine learning is a subset of AI. All machine learning achieves some goal, so it is a part of AI, but not all AI programs are machine learning. Another common term you might run across is deep learning, which was coined by Rina Dechter in her 1986 research paper on machine learning algorithms. Deep learning is the use of more complex machine learning models to achieve better accuracy. Therefore, deep learning is a subset of machine learning.
One of the biggest hurdles in machine learning is gathering data for training process. For typical ML training, called supervised learning, people must carefully curate the dataset, which includes labeling every sample by hand. Additionally, data scientists must eliminate or limit any biases in the dataset. For example, I created a voice-activated Halloween pumpkin that would laugh and flash whenever someone said “trick or treat”(above). I mistakenly used only one adult male and one adult female voice to train the model to recognize the phrase. As a result, the model was biased toward adult voices; it was incapable of correctly classifying children’s voices!
Bias, with regard to statistics and machine learning, is some error or distortion that stems from statistical analysis or model training. This personal story illustrates a type of selection bias, where the selection of data is not representative of the population intended to be analyzed or used for inference. Bias is a big concern in statistics, as it can skew results and interpretations. Therefore, it is also a big concern in machine learning.
Pre-made datasets exist, but they are often unique to a particular problem or have no real-world application. For example, if someone shared a dataset to classify motion gestures, the motions would be dependent on the type of sensor used and its placement. Data collected from a movement performed with a glove sensor would look different than a similar motion where the sensor is placed at the end of a wand.
However, a few pre-made datasets can help get you started. I regularly use the Google Speech Commands Dataset as the foundation for various keyword spotting projects. This dataset consists of several dozen spoken words, each containing over 1,000 audio samples taken from different speakers. I collect additional samples for my target keyword or phrase, such as “trick or treat,” and use the pre-made dataset to fill out samples for the “unknown” label.
The MNIST dataset contains thousands of samples of the handwritten digits 0–9. This dataset has been used in machine learning research for decades and can be a great starting point for optical character recognition (OCR) systems. Converting handwritten addresses to computer text, for example, helps postal services to automate mail delivery systems.
TensorFlow also comes with a number of datasets, including various sound, image, and text samples. Most of these sets, such as MNIST, are created with teaching and research in mind. As a result, you may see limited use for them in real-world applications.
Kaggle is a community of machine learning researchers and practitioners. It is known for hosting frequent competitions that encourage programmers to submit unique machine learning models and algorithms to tackle tough, real-world problems. Most of these competitions include pre-made datasets that can be downloaded for experimentation, even after the competition is over.
Finally, many users create or curate datasets on GitHub. Some of these require searching the internet for your particular application, but others, such the Awesome Public Datasets repository, provide a list of datasets that are easy to navigate.
Some datasets, like the Google Speech Commands, are created with low-power ML applications in mind. These low-power applications are often reserved for microcontrollers and referred to as TinyML.
Other datasets, like those used for natural language processing, can take up gigabytes and are usually reserved for larger machine learning applications running on desktops or servers.
TOOLS OF THE TRADE
A few tools and applications have helped make training and deploying machine learning models significantly easier.
Python is the de-facto programming language for curating datasets, analyzing sample features, and training machine learning models.
Several frameworks exist for creating, testing, and deploying machine learning models. One of the most popular frameworks is Google’s TensorFlow. It works with a variety of languages, but it is specifically targeted at neural networks and deep learning.
TensorFlow Lite is a subset of TensorFlow for deploying models to smartphones and embedded Linux devices, like the Raspberry Pi. It contains functions necessary for inference, but it cannot be used for training. You’ll also find TensorFlow Lite Micro within that framework, which targets microcontrollers.
Other frameworks for developing machine learning models include Sci-Kit Learn, Shogun, PyTorch, CNTK, and MXNet. Apple has their own proprietary system called Create ML that allows you to train models and Core ML to deploy them to macOS, iOS, watchOS, and tvOS.
Online editors like Google Colab provide a development interface and pre-installed machine learning packages, such as TensorFlow. While they may not be ideal for production-level training and deployment, they offer a fantastic learning environment for working with machine learning, especially for creating TinyML models.
Edge Impulse is a graphical online tool that helps you analyze and extract features from your data, train machine learning models, and generate efficient libraries for performing inference on microcontrollers (see “Build an AI Smart Nose“). I use Edge Impulse for my motion and audio classification projects.
Lobe.ai is another graphical online tool used for training machine learning models, focused on classifying images. It allows you to download trained models in several formats, including Core ML, TensorFlow, and TensorFlow Lite. While these models might work on single-board computers and smartphones (see “Trash Classifier” on page 44 of Make: Magazine Vol 77), they would require further optimization to function well on microcontrollers.
Other similar online tools include Runway ML, Teachable Machine, and V7 Labs.
These services make it easy to create models with supervised learning to classify images, video, and sound.
EMBEDDED MACHINE LEARNING APPLICATIONS
Embedded systems are computing devices that serve some specialized purpose, such as handheld calculators, microwave oven controllers, and traffic light control systems. They are usually designed to be easily manufactured, inexpensive, and low-power. Popular maker brands of embedded systems include Arduino (microcontroller boards) and Raspberry Pi (single-board computers). Because many machine learning algorithms are computationally expensive, attempting to run ML on such devices may seem silly. However, there are many potential uses of ML on embedded systems.
One prevalent use of ML on microcontrollers (TinyML) includes wake word detection, which is also known as keyword spotting. For example, if you say “Alexa” or “Hey Siri,” your phone or nearby smart speaker may come to life, waiting for further instructions. The smart speaker uses two types of machine learning. The first kind is TinyML, where inference is performed locally in the speaker’s microcontroller to listen for that wake word. Once the speaker hears the wake word, it begins streaming the subsequent audio to an internet-connected server, which performs a much more complex machine learning process known as natural language processing (NLP) to figure out what you’re asking.
In addition to speech, we can use machine learning to change the way we interact with electronics. For example, makers Salman Faris and Suhail Jr. created smart glasses for the visually impaired that would take a picture and tell the wearer what it saw through headphones (above). We could also use motion sensors and TinyML to detect and classify gestures, giving us the ability to translate sign language or perform actions by drawing shapes in the air with a wand.
In government and business, TinyML has the potential to complement Internet of Things (IoT) ecosystems. With a traditional machine learning architecture, networked sensors would need to stream data back to a central server for analysis and inference. Imagine trying to send sound or image data from tens or hundreds of sensors to a server. This setup could easily consume all of the available bandwidth in a network. Using embedded machine learning, we could have each sensor classify patterns and send the final results to the server, thus freeing up network bandwidth.
For example, the Prague Public Transit Company (DPP) has announced a partnership with Czech-based Neuron Soundware to produce audio sensors that listen to the sounds made by each of the 21 escalators in Prague’s metro system. The sensors will employ machine learning models to identify potential anomalies or unusual sound patterns to determine if certain parts in the escalator require maintenance. This approach is similar to a car mechanic listening to engine sounds to diagnose a problem.
Machine learning can help classify or identify patterns in almost any type of data. As a result, we can use motion and physiological data captured from body-worn sensors to assist with workouts and help predict potential issues. While a GPS unit doesn’t need machine learning to tell us how far we ran, what could we use to evaluate our jump shot in basketball? A combination of motion sensors and machine learning has the potential to provide such real-time feedback. Additionally, sensor suites like the EmotiBit can determine our stress level. Coupled with machine learning, this type of data could be used to classify our current emotional state or predict panic attacks before they occur.
Pete Warden is the lead developer of the Google TensorFlow Mobile team, which created TensorFlow Lite Micro, a popular framework used in many TinyML applications. He illustrates another potential use of TinyML: low-cost cameras paired with machine learning that can read old gauges and displays. In his article, he mentions how he has worked with “multiple teams who have legacy hardware that they need to monitor, in environments as varied as oil refineries, crop fields, office buildings, cars, and homes. Some of the devices are decades old, so until now the only option to enable remote monitoring and data gathering was to replace the system entirely with a more modern version.” Inexpensive, networked cameras could be used as an alternative solution to monitoring such devices without needing to fully replace the system.
Josef Müller demonstrates this low-cost gauge-reading device. He uses an ESP32 and camera to read the numbers on a water meter and report the measurements to a server.
Computer vision is a popular application of machine learning. For example, detecting objects on the road is extremely important for self-driving vehicles. Additionally, detecting the presence of people can be used to control lights and HVAC systems in an office building or identifying intruders for a security system. However, image classification and object detection are often computationally expensive. As a result, you will need a powerful microcontroller or single-board computer for many such applications.
TINYML POWER REQUIREMENTS
Machine learning usually boils down to a series of complex matrix operations — in essence, math. Almost every microcontroller and single-board computer can perform math operations, which means embedded systems are generally capable of doing machine learning. Some architectures offer features that make these operations faster, such as floating-point units or special multiply-accumulate instructions. However, the biggest concern is often, “Does my processor have enough power?”
Helen Leigh discusses the computing requirements for TinyML in her “Machine Learning on Microcontrollers” article. In the article, I am quoted saying, “I like to have at least a 32-bit processor running at 80MHz with 50kB of RAM and 100kB of flash to start doing anything useful with machine learning.” Let’s look at things a little further. I’ve made a chart with basic guidelines for speed and memory requirements for a few machine learning applications. These recommended specifications come from personal experience and are negotiable.
Motion and Distance: Using machine learning to classify various gestures or perform regression on a series of distance measurements requires a relatively low-powered microcontroller. Often, a sample is a few values taken from a sensor at a rate of less than 1kHz. My go-to microcontroller for this application would be an ARM Cortex-M0+, such as the SAMD21 found on the Arduino Zero. However, some makers have successfully employed simple machine learning algorithms on even less powerful microcontrollers, such as the ATmega328P (Arduino Uno) or even the diminutive ATtiny85.
Sound and Voice: Recording and analyzing sounds often require more processing power. Usable human vocal frequencies generally lie between 300 and 3,000Hz, and a digital microphone needs to sample more than twice that rate to create an accurate waveform of the sound. As a result, processors need to be capable of sampling at a 6kHz minimum, which helps explain why 8kHz is a standard audio sampling rate. While an ARM Cortex-M0+ might work for analyzing audio, I usually reach for an ARM Cortex-M4 instead, such as the one found in the Arduino Nano 33 BLE Sense, to perform inference with vocal and non-vocal sounds.
Vision: Previously, using machine learning to analyze and classify images and videos required powerful desktop computers or servers. Thanks to recent advancements in microcontroller hardware and machine learning libraries, we can now run simplified vision inference applications on low-powered embedded systems. While a Cortex-M0+ or M4 might run a simple vision application, such as classifying single handwritten digits, I have found that more processing power is required to do anything beyond that. For example, the capable ARM Cortex-M7 found on the OpenMV Cam is an excellent place to start. It’s capable of running MicroPython and TensorFlow Lite to classify images and perform basic object detection. For higher resolution, faster frame rates, or more complex models, you will likely need to use single-board mini computers (e.g. Raspberry Pi), smartphones, or full laptop/desktop computers.
GETTING STARTED WITH TINYML
Diving into embedded machine learning can seem daunting. The math behind many machine learning algorithms is quite complicated, and the ability to write efficient code is often required to run such algorithms on resource-constrained devices. However, the tools listed above can help make the process easier by handling many of these complexities for you. You can also find numerous resources to help you learn to become a TinyML practitioner, such as courses and books from Andrew Ng and Pete Warden, as mentioned in Helen Leigh’s article.
I also recently released a course on Coursera as a partnership with Edge Impulse: Introduction to Embedded Machine Learning. It works as a complement to the EdX TinyML course, as it provides a shorter, broader overview of embedded machine learning concepts. Additionally, it relies on the Edge Impulse tool for hands-on projects to avoid getting bogged down with TensorFlow Lite Micro versions, settings, and code.
Finally, I recommend checking out the TinyML Foundation site, which is a growing community of professionals, researchers, and enthusiasts sharing new developments in the world of embedded machine learning. The site hosts forums, annual conferences, and a weekly virtual talk by prominent members and researchers.
Each month, new hardware, software, and tools enter the market to enable the creation of even more intelligent electronics. Anomaly detection and predictive maintenance promise to save millions of dollars in costly repairs to machinery. Sound classification and person detection can enable a new suite of security-focused IoT devices. Motion and acoustic pattern detection can help researchers track wildlife. Maker projects can take on new dimensions with the ability to respond to gestures and voice commands.
Embedded machine learning is still in its infancy, and it feels much like the early days of personal computing. No one is quite sure where this new technology will take us, but the possibilities are exciting.