My Autotune Kazoo

This article appeared in Make: Vol. 91. Subscribe for more maker projects and articles!

The kazoo falls into a category of musical instrument that I would describe as “very easy to make noise, deceptively hard to sound good.” All you have to do to make a kazoo do its thing is to hum into it! But in order to play in tune, you have to hum in tune. In other words — you have to be able to sing well.

I recently watched my 2-year-old figure out how to get one buzzing and thought to myself: “Given the ubiquity and accessibility of this little plastic instrument, how could I make it easier to sound good?” And then I thought: “Well, how do we make it easier for singers to sound good?” Autotune. In 2024, we use autotune!

Now, when I say “autotune,” what I really mean is real-time pitch correction. Auto-Tune is a proprietary software audio plugin developed by the company Antares. It is a specific implementation of real-time pitch correction, which means it is an algorithm that takes sound as its input, analyzes that sound to determine the most prominent frequency within it, and then modulates that sound to make the detected frequency match one of the frequencies in a given musical scale. In other words, it constantly moves the pitch of an instrument or vocal track up or down so that it’s never out of tune.

So I embarked (and eventually succeeded!) on a quest to create a kazoo with “autotune” built in. In my YouTube video I demonstrate each of the kazoo’s features (sometimes in an embarrassing fashion) and tell the complete story of how I arrived at the current design. I wanted it to look, play, and sound like a normal kazoo as much as possible, which meant no visible wires or obvious modifications.

Here’s what I ended up with:

The kazoo

I searched “jumbo kazoo” on Google and bought the first thing that popped up. It ended up being a pack of these Chochkees brand 8″ kazoos from Amazon. I knew that, at least for the first prototype, I would want as much room as possible to fit the electronics inside. The size didn’t end up being too ridiculous so I never swapped it out.

The microcontroller

I knew I’d need something small and fast, since I would be piping audio in, processing it, and then piping it back out in almost real-time. I reached for the Seeed Studio Xiao ESP32-S3 because it’s super small, supports I2S for audio I/O, and can be clocked at 240MHz! As a bonus, it has built-in support for Bluetooth Low Energy, which I was able to leverage for some extra functionality — more on that later.

Speaker and amplifier

I ended up using one of these mini oval speakers and a MAX98357-based I2S amplifier from Adafruit. Both were perfectly sized to slide inside the mouth of the kazoo. I ripped out the kazoo’s buzzy plastic membrane (in the circular port at the top) and hot-glued the speaker in its place.

Battery and charger

I used a cylindrical 2200mAh lithium ion battery that I found on Adafruit. I ran that through SparkFun’s LiPo Charger/Booster board, which provides a steady 5V to power both the amplifier and microcontroller. It has a built-in on-off switch and Micro USB port for charging, both very handy for this project.

Pitch correction algorithm

I used an implementation of the Yin algorithm for fundamental frequency estimation. I’m going to be honest — I don’t understand how it works under the hood! I didn’t even really try to, because it worked so well right out of the box. It seems to be both fast and accurate.

For pitch correction, I made slight tweaks to RobSmithDev’s implementation which was originally written to run on the (considerably less powerful!) Arduino Uno. But the Yin algorithm can only determine the frequency of what’s being played, not the frequency of the note that should be played. I added an additional step to the algorithm that takes the output of Yin and selects a desired frequency based on a set musical scale.

The microphone

This was the part of the project I spent the most time getting right, and it ultimately defined how the entire thing came together. After some false starts with analog electret microphones, I ended up using a cheap, generic INMP441 breakout module. At first it might seem like a bad choice — the INMP441 is an omnidirectional MEMS microphone, which means that it is designed to be relatively sensitive and pick up sound from all around. I needed it to pick up sound from my mouth and only my mouth, from a very short distance. In fact, I realized that the only way I was going to be able to isolate the sound of my humming completely from the speaker output and resonant kazoo body was to completely eliminate the distance between mouth and microphone — I decided to literally put the microphone inside my mouth!

To make this possible, I wrapped the INMP441 in soft foam to dampen the sound a bit, and then I 3D printed an enclosure to go around that to keep my drool out. Note that I am using PETG and hot glue for the enclosure but I don’t recommend putting either of those things in your mouth.

The trick

It turns out that if you stick a microphone in your mouth, run the sound through some pitch correction, and then out through a tiny speaker embedded within a kazoo … it doesn’t really sound like a kazoo anymore! So instead of using the actual audio from the microphone, the speaker actually emits a synthesized sawtooth wave playing at the frequency provided by the pitch-correction algorithm. A sawtooth waveform is very easy to generate and, frankly, sounds exactly like a perfectly played kazoo!

This solution was tough for me to swallow at first — it felt like cheating. The sound wasn’t really my voice! But I realized that I had tried it the “hard” way and it was just … worse. From a player/listener standpoint, the synthesized version sounded more like a kazoo, and it was way easier to control. It’s sometimes hard as an engineer to sacrifice technical “purity,” even when it doesn’t benefit your project in any way — especially when you share your work broadly. But ultimately, you can’t let the opinions of those who’ll fuss about how something is made prevent you from making something.

The bonus

Bluetooth! I mentioned earlier that the ESP32-S3 has built-in BLE support. Since my “Autotune Kazoo” project eventually morphed into a “voice controlled synthesizer” project, I decided to take full advantage. I added code to allow the kazoo to send MIDI note messages based on the output of the Yin algorithm. These are sent over BLE to a host computer and can be used to control any software synthesizer or sampler!

And the connection runs in the opposite direction as well — you can send MIDI notes out from your digital audio workstation and play them through the kazoo’s internal synth. It’s pretty hilarious to set the kazoo down and have it appear to play itself.

Using the BLE MIDI control capabilities of the kazoo to play a software synthesizer running on my Mac. This one sounds like the lead synth from Darude’s “Sandstorm” — check out the YouTube video to hear it.

Build Your Autotune Kazoo

Project Steps

1. Flash the Microcontroller

In the Arduino IDE, install Espressif’s Arduino-ESP32 support following the guide.

Download the kazoo code in the software directory from the repository, then open that directory as a project in the Arduino IDE. With your Xiao plugged into your computer, select Tools → Board → XIAO_ESP32S3, then click Upload.

2. Install the Speaker

Hot-glue the speaker in place of the kazoo’s original diaphragm. Everything else just kind of slides in and out for easy repair.

3. Connect the Electronics

Following the wiring diagram, connect your speaker wires to the amplifier’s output terminals using a small screwdriver. I snipped off the little Molex connector first. Then solder the amplifier and microphone connections to the Xiao ESP32-S3 as shown.

Also solder the USB plug wires to the battery booster board’s 5V output terminals: red wire to positive (+) and black wire to ground (–). Plug the connector into the Xiao’s USB port to power it and the amplifier. Don’t forget to connect the Xiao’s antenna if you’re planning to use Bluetooth!

4. Connect the Battery

Plug the battery’s JST connector into the booster board.

5. Assemble Your Kazoo

Plug the Xiao into the battery booster, then slide everything inside the kazoo. The battery goes in first and wedges into the slimmer end of the kazoo body. The amplifier and microcontroller go in after that, followed by the battery charger board and microphone — those need to be easy to reach.

Once you’re sure everything’s working, you can hot-glue the microphone module into its 3D-printed enclosure.

Conclusion

Annoyingly Perfect Pitch

Put the mic in your mouth and give your Autotune Kazoo a go. It will automatically turn your out-of-tune humming into an appropriately annoying, buzzy sawtooth wave — with perfect pitch! Here’s a breakdown of the project code in autotune_kazoo.ino.

The setup() initialization function (line 196) is straightforward:

First we create a new instance of the SawtoothWaveGenerator (line 200). This is what produces the synthesized sound samples that get fed out to the speaker.
Then we initialize the Yin algorithm (line 202) implementation. We pass it a float value (0.2f, for example) representing how strict we want it to be with the pitch detection. A lower value here means that the algorithm will only return a pitch estimation if it is extra confident. A higher value means that it will give you a guess more often, but it might not be as accurate.
Next we initialize the Bluetooth MIDI library (line 204). We provide callback functions for when a connected host sends us MIDI data and tell it to start advertising and accepting connections.
The rest of the setup function (lines 214–277) is just boilerplate to initialize the I2S sound streams: input from the microphone, and output to the speaker. Beware that the pin number may need to change if you wire things differently!

The main loop does the following:

First, read a buffer (line 284) full of bytes from the microphone via I2S.
Then loop through the samples (line 287) and convert them from float values down to signed 8-bit integers. This implementation of the Yin algorithm was optimized to work on 8-bit microcontrollers so that’s what it accepts as input.
There are four buffers (line 23) that get filled up with these 8-bit samples, and each one has a different starting offset. This allows us to look at different “windows” of the incoming data and perform multiple pitch estimations on the same large chunk of incoming data. I found that the pitch estimation values were more accurate if I ran the algorithm more frequently and then averaged the outputs. So each time one of the four sub-buffers gets filled (line 303), the Yin algorithm is re-run and averaged with the last few estimations.
Each time a new pitch estimation value is available, we make sure it is in a valid range (line 112). The Yin algorithm returns -1 if it can’t make a good guess, and realistically nobody is humming into a kazoo at a frequency higher than a few hundred hertz so we ignore all values outside the range (0, 1500) to be conservative. (Side note: Apparently the world record for highest note sung was roughly 25,000Hz — well beyond the limits of human hearing!)
1. If the pitch estimation value is too far away (line 135) from the previous readings (more than 90Hz), we don’t do anything! Assume this was an error, or the very beginning of a new frequency. If it is a valid new frequency, the next reading will confirm that. We have a little time before we need to change the output. Changing too quickly back and forth will sound bad.
2. Otherwise, if the pitch estimation is outside of the valid range (line 139), we assume the user has stopped humming! We set our activePitch variable to -1, and we send a Note Off message to any connected MIDI devices.
3. Otherwise, we assume we have a valid frequency! We convert that frequency (line 146) to a musical note. If that note is not included (line 149) in our targeted musical scale, we find the next closest note, either above or below the current one. We convert that note back into a frequency and update the SawtoothWaveGenerator with that value. We set our activePitch variable (line 157) to the updated frequency. If this note is different from the last one we detected, we send a quick Note Off MIDI message to any connected hosts for the previous note, followed by a new Note On message for the updated one.
If there is a new MIDI Note On message (line 79) sent from a connected host, we override the activePitch setting from step 4 with the MIDI note. MIDI always takes precedence over the microphone.
After all the samples are processed, we look at the value of our activePitch (line 309) variable. It is set to -1, we fill our sound output buffer with silence. If it is set to a valid frequency, we populate it with a chunk of samples from our SawtoothWaveGenerator object. That’s our sound going out to the speaker!
We write the sound output buffer (line 313) to the I2S output stream.
Finally, we poll (line 314) to see if there are any new MIDI messages waiting to be processed.

Photography by Guy Dupont.

This article appeared in Make: Vol. 91.

Projects from Make: Magazine