reCAPTCHA: distributed book digitization while fighting spam

reCAPTCHA: distributed book digitization while fighting spam

Thanks to spammers, we now are forced waste a substantial portion of time every day, typing in obfuscated wiggly letters to prove we are human. reCATPCHA is a slick idea for using the CAPTCHA system for doing something productive (…besides distinguising between homo sapien and homo computatralis).

With reCAPTCHA, the user is given two words, one known by the system and one from a book that previously failed character recognition. When the user enters both words, the sytem verifies the known word, proving human-ness, and submits the second word to a central database, which helps digitze books from the Internet Archive. With 60 million CAPTCHAs being solved every day, this could be a huge assist for portions of text that can’t be handled by optical character regognition techniques. [via] Link

Negative CAPTCHA

0 thoughts on “reCAPTCHA: distributed book digitization while fighting spam

  1. tpe says:

    Client A then attempts to connect to a range of ports on client B’s machine. All these requests will fail at client B’s firewall, of course. However, in the process a side effect has occurred. Client A has told its own firewall to allow traffic from all of client B’s scanned ports! Now, when client B attempts to connect to client A, assuming its outgoing port was previously scanned (which it likely will be), the request will get through to client A’s machine.

    I don’t think this is correct. Portscanning a remote system won’t “tell the firewall to allow traffic from […] the scanned ports”. This would be a really lame way to firewall. Instead, firewalls and NAT systems work on established sockets. Another socket connection that is utilizing the same port won’t magically get through just because that port has been used recently.

  2. jason_striegel says:

    Think about how a connection is created in a typical scenario:

    You send a packet from port 1234 to port 80. When this packet goes through your NAT router, it creates a lookup table entry that says port 1234 is communicating with outside server port 80. then responds with a packet from port 80 to port 1234. When your NAT router sees this, it looks at the lookup table, checks to see if there was a mapping there, and determines whether to send the packet on (which it does in this case) or discard it.

    So essentially, sending a packet to a machine on the other side of your NAT router causes the router to allow incoming packets from that machine, with the specific from and to ports that the original packet contained.

    When two clients, both behind their own NAT router, need to talk to eachother, they can coordinate a set of ports with a third-party public server. Then it’s a matter of punching holes through their own routers until packets start coming through and both sides see traffic.

    FYI, this is made much easier with UDP (as opposed to TCP), as there are no sequence ids to deal with.

  3. -=MaGGuS=- says:

    Why client A scans a range of ports B? Why not use one port?

  4. says:

    For how long does the firewall stay punched?

    Does anyone know a good article about punching a hold in a firewall using TCP?


  5. ricky says:

    How can i view webcam and call pc to pc in yahoo messenger if i’m only connected in proxy server? why in skype i can view webcam and call pc to pc?

Discuss this article with the rest of the community on our Discord server!