Computers & Mobile Craft & Design Photography & Video
“Text-to-movie” site for total in-browser robo-cartoon making

Last Friday an old friend sent me a link to this extremely amusing animated short called So you Want to Get a PhD in the Humanities, uploaded by YouTube user MinnieMouse1224. I assume that she or he is the author.

Besides being hilarious (and in deference to doctors of humanities, I’d say most of the jibes equally apply to science Ph.D. experiences) the short is notable in that it was created entirely without sets, actors, voice-overs, or recording equipment of any kind. All that was involved was a very clever writer typing lines of dialogue into a browser window (OK, maybe a downloadable client, but whatevs) over at Their software generates spoken dialogue using TTS algorithms and animates the characters’ mouths to match. A simple scripting interface allows for a fairly broad selection of sets, camera angles, character models, voices, expressions, gestures, background sounds, and music. A limited set of these options is available to try for free; most of the interesting ones require payment using a dedicated currency kinda like “Microsoft Points” on the XBox 360. (To which, a resounding booooooo.)

Those of you who were watching the Colbert Report last Monday night probably caught this Geico commercial that was reportedly made using XtraNormal. It’s called Superheroes, and is also pretty dang funny.

Both videos manage to make a Steven-Wright-style comedic virtue of the necessity for flat, utterly expressionless “performances” imposed by XtraNormal’s nascent text-to-dialogue technology. My own, dramatically less successful attempt at a similar effect is here. The 56 seconds of my test video required about five hours of editing, over the course of the weekend, most of which was spent fiddling with details of punctuation, capitalization, and hyphenation to achieve even a minimal amount of natural inflection and expression in my virtual actor’s lines.


This is a screenshot of the web-based video scripting interface. Although the site is still slightly wonky (and their downloadable client State even moreso, in my experience), the editor really is remarkably powerful given its limited feature set. As an astute commenter in their forums has observed, XtraNormal’s platform would greatly benefit by the adoption of a Microsoft-style markup language for TTS inflection and phrasing.

My video cost 37 XP to produce, because the “Waterloo” set is not among the freely available scenes. 300 points was the minimum purchase, and set me back five bucks, which means I paid 62¢ to publish it. (That’s assuming I find some use for my remaining 263 points, which are, of course, non-transferable.)

And although it remains to be seen if XtraNormal’s business model is going to survive the rising groundswell of interest in the technology, the advent of their ubiquitous text-to-movie software lifts the very last remaining entry barrier to indie movie-makers, which is itself a significant milestone. There can be little doubt we’re going to see a huge explosion of these robo-cartoons in the near future, and correspondingly rapid improvements in the emotive abilities of the speech synthesizers. I’m curious, too, about what the new noun is going to be/already is. Has anybody heard it yet? [Thanks, Maya!]


I am descended from 5,000 generations of tool-using primates. Also, I went to college and stuff. I am a long-time contributor to MAKE magazine and My work has also appeared in ReadyMade, c't – Magazin für Computertechnik, and The Wall Street Journal.

View more articles by Sean Michael Ragan