OntoSearch

Ontology Search Engine

Web Design Articles – Captcha: Concept & Interpretation

April 26th, 2011

Admin

CAPTCHAs are normally 1 or two words presented as graphics, overlaid with some type of distortion, and they function as a test that relies on your human ability to recognize them. CAPTCHA stands for “Totally Automated Public Turing test to tell Computers and Humans Apart.” The CAPTCHA innovation was pioneered by developers at Carnegie Mellon University. The notion behind it was to develop a means of distinguishing between individuals and web robots, so that web sites could give their resources to individual humans with out being exploited by robots.

The Require for CAPTCHA

Site owners face a number of special challenges in protecting their resources from automated harvesting. These include:

Resources may be costly to provide, and machines can consume far a lot more data far much more speedily than humans. As a result, services that are machine-accessible might prove prohibitively costly to maintain.

Permitting bots to post comments and user-generated content opens a floodgate for spammers, which inevitably results in massive volumes of spam — often to the point where a service becomes unusable.

Data could be highly sensitive, such as personal medical or financial details, and requirements to be sufficiently protected to stop against attacks from data-mining robots.

Interactions with a system may possibly have fundamental implications for society as a whole; take into account the problems that would arise in the case of electronic voting.

The Issue with CAPTCHA

CAPTCHA systems create a substantial accessibility barrier, since they require the user to be able to see and understand shapes that might be quite distorted and challenging to read. A CAPTCHA is for that reason difficult or impossible for people who are blind or partially sighted, or have a cognitive disability such as dyslexia, to translate into the plain text box.

And of course there can be no plain-text equivalent for such an image, because that alternative would be readable by machines and for that reason undermine the original purpose.

Since users with these disabilities are unable to perform critical tasks, such as creating accounts or making purchases, the CAPTCHA system can clearly be seen to fail this group.

Such a system is also eminently crack able. A CAPTCHA can be understood by suitably sophisticated scanning and character recognition software, such as that employed by postal systems the world over to recognize handwritten zip or postal codes. Or images can be aggregated and fed to a human, who can manually process thousands of such images in a day to produce a database of recognized images — which can then be effortlessly identified.

Recent high-profile cases of bots cracking the CAPTCHA system on Windows Live Hotmail and Gmail have highlighted the issue, as spammers developed thousands of bogus accounts and flooded the systems with junk. Even much more recently, security firm Websense Security Labs have reported that the Windows Live CAPTCHA can be cracked in as little as 60 seconds.

One CAPTCHA-cracking project, referred to as PWNtcha (“Pretend We’re Not a Turing Computer but a Human Antagonist”), reports success rates between 49% and 100% at cracking some of the most common systems, which includes 99% for the system used by LiveJournal, and 88% for that employed by PayPal.

Thus, the growth and proliferation of CAPTCHA systems need to be taken less as evidence of their success than as evidence of the human propensity to be comforted by things that offer a false sense of security.

It’s ironic that CAPTCHA can be defeated by those who are sufficiently motivated, when they’re the quite same individuals the test is created to protect against. Just like DRM, CAPTCHA systems ultimately fail to protect against the original threat, although simultaneously inconveniencing ordinary users.

Alternatives to CAPTCHA

Non use of CAPTCHA would improve the volume of unwanted traffic, sometimes to an unmanageable extent.

Clearly there’s a want for something. So what are the alternatives to CAPTCHA?

Non-linguistic Visual Tests

Tests that use images other than words may be typically less difficult for users, since all they have to do is comprehend an undistorted picture, rather than decode distorted language.
The system shows you a set of nine images, 3 of which are kittens. You have to identify the 3 kittens in order to pass authentication.

Audio Tests

An option to a visual CAPTCHA test is an audio test, where a series of words or letters are spoken out loud and provided to users as an audio file; this audio is also overlaid with distortion of some type, in the exact same attempt to stop programmatic decoding.

Nevertheless, such tests have exactly the same problems as visual CAPTCHAS. They solve the visual issue, positive, but they do so by introducing yet another, equally problematic barrier. Individuals who are deaf and blind, who work in a noisy environment, lack the needed hardware for sound output, or are unable to realize the sound due to a cognitive disability, or even a language barrier, are no far better supported than with a conventional visual test.

Also, audio tests are as equally vulnerable to being cracked by suitably motivated bot programmers as visual ones.

Logical or Semantic Puzzles

Eric Meyer’s Gatekeeper plugin for WordPress works by asking a basic question, framed in such a way as to make it very challenging for machines to recognize even though blatantly obvious to humans. Would you get this 1?
Other questions may well be “What color is an orange?” or “How several sides does a trangle have?”

The main lacuna of this system is its scope. It has a limited number of questions and answers and is consequently vulnerable to brute-force attack. That difficulty can be reduced — but not solved entirely — utilizing flood-control (preventing a single user from making numerous attempts within a particular timeframe) and by ensuring that the selection of questions is large and regularly changed.

But the system is also underpinned by assumptions of knowledge. Ideally, the questions must be so straightforward that a child could answer them effortlessly — as is definitely the case in this example. But for every question, we still have to assume that any human can answer it, which could not be true, specially when you factor cognitive disability or language barriers into the equation.

Individual Authentication

For the highest level of security, individual authorization is often required. To log in to on the web banking, pay a credit-card bill, or vote, the system requirements to know not just that you’re a human, but that you’re a particular human.

This type of authentication could be harnessed to offer a lower level of certainty in a lot more general applications, as authentication for a system where your particular identify is not required — only that you’re a individual.

The simplest approach here is to require users to register before being able to comment, post, or add content to a site. This undoubtedly reduces the amount of casual spam that a system may well get, but it does nothing to put off a determined spammer who’s ready to take the time to produce an account.

It’s not hard to discover large numbers of men and women ready to do this kind of work for next to nothing, given the wide range of living costs across the world economy. It would be trivially low cost for a spammer in a rich country to pay men and women in a poor country to do this type of work all day.

Centralized Sign-on

A system of centralized sign-on can mitigate the potential for abuse by putting all the impetus on a single system to authenticate users once, and then give them free rein thereafter.

Systems such as Microsoft Passport provide this type of centralization; nonetheless, they also create substantial privacy questions, as you have to be ready to trust your personal data to a single, commercial entity (really apart from the reality that Passport uses CAPTCHA authentication!).

Nonetheless, a most promising alternative to this has recently begun to gain traction, in the form of OpenID. The OpenID system avoids privacy issues because it isn’t limited to a single authentication provider — you can pick and choose, and change at any time, who you trust to hold your authentication data. This data in turn is not revealed to the site you’re visiting; consequently, it offers a convenient means of centralized authentication without the attendant privacy problems.

The weak point of the system is how you obtain an OpenID in the first place, since some form of authentication is going to be needed there. Merely having an OpenID is not sufficient to prove that you’re a legitimate user, so the onus would end up being on individual web sites or OpenID providers to police the use of OpenID; for example, by banning OpenIDs that are recognized to be spammers. This in itself could end up being a minefield for disputes.

OpenID is a very good thought, and is bound to catch on, but in itself does not address the concern at hand any greater than individual authentication.

Non-interactive Solutions

We’ve looked at a number of interactive solutions now, and seen how none of them are entirely ideal, either for protection from robot attack, or for reliably identifying humans without introducing accessibility barriers.

Perhaps the answer lies with non-interactive solutions. These analyze data as its being submitted, rather than relying on users to authenticate themselves.

Honey Traps

The notion here is that you consist of a form field, which is hidden with CSS, and give it a name that encourages spam bots to fill it in, such as “email2.” The human user will never fill it in due to the fact they don’t know it’s there, but the bot won’t be able to tell the distinction. For that reason, if that field contains any value when the form is submitted, the submission is rejected.

The dilemma is that assistive technologies may possibly not be able to tell the difference either, and so their users could not know not to fill it in. That possibility could be decreased with descriptive text, such as “do not complete this field,” but doing that might be really confusing, as well as being recognizable by a bot.

One more variant of this is a easy trap that asks human users to confirm they’re not robots. This could take the form of a checkbox, like this 1.

In both these examples, however, bots could discover to recognize the trap and thereby circumvent it. It’s 1 of those things that only works as lengthy as not numerous individuals are making use of it — as soon as it became prevalent, on high-visitors sites like Digg or Facebook, the spammers would simply adapt.

Session Keys

A partial remedy for form submission is to generate a session key on the fly when building the original form, and then check that session key when the form is submitted. This will prevent bots that bypass the form and post directly to its target, but it does nothing to stop bots that go by way of the normal web form.

Limited-use Accounts

One way for a system such as totally free email to limit abuse by robots is to deliberately throttle new accounts for a period of time; for example, by only permitting ten emails to be sent per day for the initial month.

Nonetheless, this approach could not ultimately help. It may possibly reduce the incidence of abuse on a per-account basis, but it doesn’t stop abuse entirely. There’s also nothing to stop a spammer from merely signing up for thousands of accounts and sending ten spam emails from each one. And of course, such a limitation may possibly affect legitimate users as well, but legitimate users aren’t going to be inclined to sign up for numerous accounts.

Conclusion

Don’t make users take responsibility for our troubles.

Bots, and the damage they trigger, are not the fault or responsibility of individual users, and it’s totally unfair to expect them to take the responsibility. They’re not the fault of site owners either, but like it or not they are our responsibility — it’s we who suffer from them, we who benefit from their eradication, and consequently we must shoulder the burden. Moreover, the common theme with all interactive alternatives is that they fail users who have a cognitive disability, or don’t recognize the same cultural cues as the author, or use assistive technologies. The a lot more stringent the system, the higher the bar is raised and for that reason the greater the chance of failing to recognize or admit a real human.

The Future

It’s clear that both interactive and non-interactive tests will continue to be used by internet site owners for the foreseeable future. Developers will try to come up with new and much better tests, and spammers will continue to locate ways of cracking them; it’s very a lot a vicious circle.

Perhaps, at some point in the future, somebody will come up with a test that is truly reliable and uncrackable — some thing that identifies humans in a way that can not be faked. Maybe biometric data such as fingerprints or retina scans could factor into that somewhere; perhaps we’ll have direct neural interfaces that identify the presence of brain activity.

So go gaga about CAPTCHA!

Posted in Semantic Web

Both comments and pings are currently closed.

Comments are closed.

OntoSearch

Ontology Search Engine

Web Design Articles – Captcha: Concept & Interpretation

Recent Posts

Related Sites