“Text-to-movie” site for total in-browser robo-cartoon making

Computers & Mobile Craft & Design Photography & Video
“Text-to-movie” site for total in-browser robo-cartoon making

Last Friday an old friend sent me a link to this extremely amusing animated short called So you Want to Get a PhD in the Humanities, uploaded by YouTube user MinnieMouse1224. I assume that she or he is the author.

Besides being hilarious (and in deference to doctors of humanities, I’d say most of the jibes equally apply to science Ph.D. experiences) the short is notable in that it was created entirely without sets, actors, voice-overs, or recording equipment of any kind. All that was involved was a very clever writer typing lines of dialogue into a browser window (OK, maybe a downloadable client, but whatevs) over at XtraNormal.com. Their software generates spoken dialogue using TTS algorithms and animates the characters’ mouths to match. A simple scripting interface allows for a fairly broad selection of sets, camera angles, character models, voices, expressions, gestures, background sounds, and music. A limited set of these options is available to try for free; most of the interesting ones require payment using a dedicated currency kinda like “Microsoft Points” on the XBox 360. (To which, a resounding booooooo.)

Those of you who were watching the Colbert Report last Monday night probably caught this Geico commercial that was reportedly made using XtraNormal. It’s called Superheroes, and is also pretty dang funny.

Both videos manage to make a Steven-Wright-style comedic virtue of the necessity for flat, utterly expressionless “performances” imposed by XtraNormal’s nascent text-to-dialogue technology. My own, dramatically less successful attempt at a similar effect is here. The 56 seconds of my test video required about five hours of editing, over the course of the weekend, most of which was spent fiddling with details of punctuation, capitalization, and hyphenation to achieve even a minimal amount of natural inflection and expression in my virtual actor’s lines.


This is a screenshot of the web-based video scripting interface. Although the site is still slightly wonky (and their downloadable client State even moreso, in my experience), the editor really is remarkably powerful given its limited feature set. As an astute commenter in their forums has observed, XtraNormal’s platform would greatly benefit by the adoption of a Microsoft-style markup language for TTS inflection and phrasing.

My video cost 37 XP to produce, because the “Waterloo” set is not among the freely available scenes. 300 points was the minimum purchase, and set me back five bucks, which means I paid 62¢ to publish it. (That’s assuming I find some use for my remaining 263 points, which are, of course, non-transferable.)

And although it remains to be seen if XtraNormal’s business model is going to survive the rising groundswell of interest in the technology, the advent of their ubiquitous text-to-movie software lifts the very last remaining entry barrier to indie movie-makers, which is itself a significant milestone. There can be little doubt we’re going to see a huge explosion of these robo-cartoons in the near future, and correspondingly rapid improvements in the emotive abilities of the speech synthesizers. I’m curious, too, about what the new noun is going to be/already is. Has anybody heard it yet? [Thanks, Maya!]

10 thoughts on ““Text-to-movie” site for total in-browser robo-cartoon making

  1. RocketGuy says:

    My wife sent me the iPhone xtranormal cartoon (“I don’t care” etc).

    Since we had just talked about me getting a new phone, with shall we say “differing opinions” I made immediately a movie on topic and sent it to her.

    She watched repeatedly, and apparently it’s the funniest thing I ever did.

    Since it took me 15 minutes, this either means:

    A)Great Technology,
    B)I’m not funny in general,
    C)Robotic Voice profanity is intrinsically funny,
    D)Truth obscured by 3D graphics hurts less,
    Or probably E) all of the above.

    Have fun!

    1. RocketGuy says:

      Up front warning here, profanity is robotically delivered but present. Shield the innocent. Or not.

      Also, since you’re not my wife, and it’s out of context, it’s probably not going to be all that funny to you. Sorry.

      “But Honey, it’s 4G!”

      1. Sean Michael Ragan says:

        I was actually sort of nervous when I got your comment because I was anticipating having to smile and nod politely, but that was, in fact, pretty funny. Thanks for posting it. And I think there’s more going on than just the robotic profanity effect. I have seen a bunch of senselessly vulgar robo-vids on XtraNormal over the weekend, and I can tell you that, by itself, that gets old real real fast.

        1. RocketGuy says:

          Maybe I’ve just found my media…

  2. Steve Hoefer says:

    I think the noun is probably “Machinima”. Though most of the genre repurposes game engines to make movies and rarely uses TTS, it has a lot in common with it. (Computer generated images, low budget, low barrier to entry, virtual actors, etc.)

Comments are closed.

Discuss this article with the rest of the community on our Discord server!

I am descended from 5,000 generations of tool-using primates. Also, I went to college and stuff. I am a long-time contributor to MAKE magazine and makezine.com. My work has also appeared in ReadyMade, c't – Magazin für Computertechnik, and The Wall Street Journal.

View more articles by Sean Michael Ragan


Maker Faire Bay Area 2023 - Mare Island, CA

Escape to an island of imagination + innovation as Maker Faire Bay Area returns for its 15th iteration!

Buy Tickets today! SAVE 15% and lock-in your preferred date(s).