Shaun Friedle created an impressive piece of Javascript which can automatically defeat CAPTCHAs used by the Megaupload file hosting service. While their CAPTCHAs are particularly weak, it’s an impressive Javascript feat that breaks into some new territory, namely Javascript-based optical character recognition. John Resig posted a breakdown of how the software works. Here’s the quick summary:
- The HTML 5 Canvas getImageData API is used to get at the pixel data from the Captcha image. Canvas gives you the ability to embed an image into a canvas (from which you can later extract the pixel data back out again).
- The script includes an implementation of a neural network, written in pure JavaScript.
- The pixel data, extracted from the image using Canvas, is fed into the neural network in an attempt to divine the exact characters being used – in a sort of crude form of Optical Character Recognition (OCR).
Shaun designed the software as a Greasemonkey script that will break CAPTCHAs for Megaupload and automatically trigger a download. The code is designed specifically for this CAPTCHA style, but there’s no reason why the getImageData trick combined with a alternate OCR implementation couldn’t be used to solve for other systems. This is pretty fascinating stuff.
Is there a better (more convenient, harder to cheat) way to prove humanness? What else could you make in Javascript using OCR, neural nets, or per-pixel image processing?
Megaupload Auto-fill CAPTCHA
MuCaptcha Online Demo
OCR and Neural Nets in JavaScript – John Resig
ADVERTISEMENT