- The HTML 5 Canvas getImageData API is used to get at the pixel data from the Captcha image. Canvas gives you the ability to embed an image into a canvas (from which you can later extract the pixel data back out again).
- The pixel data, extracted from the image using Canvas, is fed into the neural network in an attempt to divine the exact characters being used – in a sort of crude form of Optical Character Recognition (OCR).
Shaun designed the software as a Greasemonkey script that will break CAPTCHAs for Megaupload and automatically trigger a download. The code is designed specifically for this CAPTCHA style, but there’s no reason why the getImageData trick combined with a alternate OCR implementation couldn’t be used to solve for other systems. This is pretty fascinating stuff.