You should be automatically redirected . If not, visit
http://newlisper.wordpress.com
and update your bookmarks.

14/09/2007

Gotcha

Those captcha things are everywhere these days. You know, the little pictures of random letters that you have to identify before you can post a comment or display a web page. There's even one on this blog, courtesy of Google/Blogger - if you want to comment on these posts of mine, you'll be expected to identify terribly tortured type and gruesomely garbled glyphs before your thoughts can be published.

I didn't realise that 'CAPTCHA' is an acronym - it stands for Completely Automated Public Turing test to tell Computers and Humans Apart (a struggle, but they made it work, just). It's even been trademarked by Carnegie-Mellon University.

To be honest, I'm not a big fan of the typographical captcha. I can understand their purpose, and they're acceptable as part of the ongoing fight against spammers. But I don't feel good when I'm typing nonsense into little boxes. Letters can be beautiful things that help us communicate, and these captchas are ugly reminders of an ugly reality. The task of responding to a captcha is the McJob of the internet browsing world - a sad and mindless task that it usually isn't worth teaching computers to do.

(It turns out that many of these captchas are now crackable by dumb computers and their clever programmers: see here and here, for example.)

Another problem with captchas is that they're hard for people who have vision problems. It's sad that computers can sometimes solve things that they're not supposed to while humans can't sometimes solve things that they've been made to.

Unfortunately, it's hard to think of anything more pleasant that offers any kind of resistance to automated attacks. Given enough time, most simple text-based captchas can be overcome, particularly if the rewards are sufficient. Some spammers presumably spend days cracking challenging Yahoo and Hotmail captchas so that they can send millions of spam emails from temporary accounts.

But something is better than nothing, and it's fun to try and devise a simple question/answer method that's just challenging enough to deter the idle spammer.

My first attempt, for verifying comments posted to a newLISP wiki, was fun, but probably not very effective as a security measure. Here's the basic idea, with a small sample data set:

(set 'data '(
  ({apple} {pear} {cherry} {mango} {peach} {lime} {strawberry} {kumquat})
  ({onion} {carrot} {potato} {bean} {pepper}  {cucumber})
  ({red} {ultramarine} {pink} {green}  {blue} {turquoise})
  ({bach} {beethoven} {zappa} {liszt} {mozart} {lennon} {brahms})
  ({london} {paris} {chicago} {rome} {athens} {moscow} {beijing})
  ({oxygen} {lead} {plutonium} {calcium} {cobalt} {strontium})
  ({elephant} {mouse} {lion} {toad} {frog} {slug})
  ({ls} {cat} {vi} {ps} {echo} {man} {ed} {diff} {troff})))
(seed (date-value))
(define (generate)
  (map set '(odd-one-list others-list) (randomize data))
  (set 'odd-one (first (randomize odd-one-list)))
  (map set '(other1 other2 other3) (randomize others-list))
  (join (randomize (list odd-one other1 other2 other3) ) { }))

The question is something like "what's the odd one out?", followed by the result of (generate). For example:

  zappa beethoven lennon cobalt

The user then has to type the odd one out before the comment is posted. I liked the strange poetry that this generated:

cat vi oxygen ls
athens moscow lion chicago
rome oxygen lead calcium
lime cherry ultramarine pear
bach ultramarine zappa mozart

and it wouldn't be an unpleasant task to answer such a simple question. I also like the idea that you could adjust the difficulty level by changing the data lists, not just by adding more obscure elements from the periodic table and some more unusual Unix commands, but by adding lists of newLISP functions, or famous brands of whisky. It's tempting, but probably unjustified, to think that spammers are stupid and would have trouble answering questions like these.

A subtle problem to avoid is that some words belong to two categories - cat, for example. But a more serious deficiency is that, if the question is multiple choice, and the n options are easily identified, the spammer has a 1 in n chance of getting the solution just by choosing an answer at random. So it's a good idea to make sure that the right answer isn't on display at all. Finally, though, the most serious drawback is that it would be easy to solve these simple questions by writing a short program, assuming that there's not already an odd-one-out server. (Any takers? - it'd be an interesting task, given that the data lists themselves, and the complete list of categories, would not be immediately apparent to the would-be malefactor and would emerge only after repeated attempts.)

So I've started thinking about alternatives where the answer isn't visible on the web page. No luck yet - arithmetic sequences with the last number missing are a bit dull!

2 Comments:

At 02:06, Anonymous Jeff Ober said...

Find a few dozen small graphics. Map their filenames to a few symbols describing them and then match them together by symbols, with one that does not match the symbols of any of the others.

 
At 23:30, Blogger newdep said...

What is the capital city of G.B.?
or
1234 * 12345 = 344523445?
or
How many colors do you see here?
or
what is the current GMT?...

its all there to be published... a little
harder to knack..but its saver then whats avialable now...

...intresting topic to think about ;-)
Norman.

 

Post a Comment

Links to this post:

Create a Link

<< Home