Contact Form Honeypots

On my portfolio page I have a form to send me an email—not exactly a new thing to do on the Internet. I’m sure many of you have built a similar form at some point in your life.

If you have ever had one of these contact forms you’ve probably also realized that they are a very convenient tool for spammers to reach your inbox. Within a few hours of adding my contact form I was already getting some unsolicited messages about cheap pharmaceuticals; and since the messages were being sent from an SMTP server I had configured myself they went right past my spam filter.

Since I’d rather keep these messages from getting to my inbox, I decided to brainstorm some ways to stop the spam.

In order to provide some kind of effective defense we will have to make a few assumptions:

Assumption 1: A spammer is likely using automated tools to fill in and submit our form.

Given this, the first defense mechanism that comes to mind is a CAPCHA. These are the de facto tool to determine if someone on your site is a human or a robot, so this would surely work.

But, as anyone who has used the Internet in the last 10 years can tell you, CAPCHAs are a real pain in the butt for those human users. Aside from the fact that they can be extremely annoying to read, they are also a total eye sore.

I think we can do better.

Assumption 2: Spammers submit the contact form shortly after requesting the page

A real user will have to read over the page and type in their message into the form by hand, while a computer program can do these things with little to no delay.

So.. why don’t we just time how long it takes for the user to submit the form?

More specifically, I’m suggesting that whenever your page is requested the server should insert a timestamp somewhere. Then, when the form is submitted, the request timestamp is subtracted from the submission timestamp to get the total time it took the user to view and submit the form. If this total is above some arbitrary threshold, then it should be considered “real” and if it is submitted too fast then it can be considered “spam” and discarded, never to reach your inbox.

I actually implemented this approach on my site through the use of a hidden form element:

1
<input type="hidden" name="time1" value="<%= new Date().getTime()%>" />

Then, on submission:

1
2
3
4
5
6
7
var timeDiff = (new Date().getTime()) - req.body.time1;
if (isNaN(timeDiff) || timeDiff < 6500) {
  console.info('The contact form was submitted too soon. Ignoring submission.');
  res.send(500, { error: 'You must wait at least 6.5 seconds before submitting the form. Please try again.' });
  return;
}
// otherwise, go ahead and send the email

Unfortunately, while this approach did block some spambots, I was still receiving a few spam submissions on occasion. I’m not sure if this is because the bots were taking a while to submit, or if they were smarter (i.e., modifying the hidden element in some way).

Rather than investigate it further, I decided to supplement my efforts with another heuristic.

Assumption 3: Spammers waste no opportunity to fill in links and text promoting their products

This is what lead me to the “honeypot” approach. For those who aren’t familiar: in the world of computer security a honeypot is a type of trap that is intentionally disguised to look very attractive to the nefarious, while (ideally) not impacting the innocent users of a system.

So, for my honeypot, I decided to create a field in my contact form with the label of “website.”

And really: what spammer could possibly resist filling in such an inticing, targeted field for their awful scam sites?

With this, the spam detection heuristic is simple: if there is anything written in the “website” field, then the entire submission is spam and should be discarded.

But what if a legitimate user decides to fill out this field? We will be discarding perfectly valid submissions! Well, we could write some text telling the user to not write anything in that field, but that hardly seems elegant…

Assumption 4: Spammers’ automated tools likely don’t have a full-blown JavaScript interpreter, while nearly every modern web browser does

Writing a JavaScript interpreter is complicated. Even with open source interpreters like Google’s V8 out there, it would be cumbersome for a spammer to use one in their site-scraping tools given that many scripts do complex things (that the spammer wouldn’t care about) or run at unpredictable and asynchronous times. Adding JavaScript support would be a lot of extra overhead for the spammer that, in 90% of the cases, would have little to no payoff.

But, since our users will likely all have JavaScript enabled, we can simply use JavaScript to hide the “website” form element from the page immediately after it has loaded (and include that “Don’t write anything here” text in the hidden content as a backup for the one or two people out there using Lynx).

Results

After using both of the above techniques I have seen 100% of the spam stemming from my website’s contact form disappear—and without having to resort to CAPCHAs!

Nothing is completely foolproof, however, and all it would take to bypass my methods is a spammer tool with a JavaScript interpreter that takes more than 6.5 seconds to submit the form. But given that spammers are playing a numbers game, it likely isn’t worth their time to fight against these measures when there are so many other, completely vulnerable contact forms out there for them to exploit.

Comments

Comments