Contact Form Honeypots
On my portfolio page I have a form to send me an email—not exactly a new thing to do on the Internet. I’m sure many of you have built a similar form at some point in your life.
If you have ever had one of these contact forms you’ve probably also realized that they are a very convenient tool for spammers to reach your inbox. Within a few hours of adding my contact form I was already getting some unsolicited messages about cheap pharmaceuticals; and since the messages were being sent from an SMTP server I had configured myself they went right past my spam filter.
Since I’d rather keep these messages from getting to my inbox, I decided to brainstorm some ways to stop the spam.
In order to provide some kind of effective defense we will have to make a few assumptions:
Assumption 1: A spammer is likely using automated tools to fill in and submit our form.
Given this, the first defense mechanism that comes to mind is a CAPCHA. These are the de facto tool to determine if someone on your site is a human or a robot, so this would surely work.
But, as anyone who has used the Internet in the last 10 years can tell you, CAPCHAs are a real pain in the butt for those human users. Aside from the fact that they can be extremely annoying to read, they are also a total eye sore.
I think we can do better.
Assumption 2: Spammers submit the contact form shortly after requesting the page
A real user will have to read over the page and type in their message into the form by hand, while a computer program can do these things with little to no delay.
So.. why don’t we just time how long it takes for the user to submit the form?
More specifically, I’m suggesting that whenever your page is requested the server should insert a timestamp somewhere. Then, when the form is submitted, the request timestamp is subtracted from the submission timestamp to get the total time it took the user to view and submit the form. If this total is above some arbitrary threshold, then it should be considered “real” and if it is submitted too fast then it can be considered “spam” and discarded, never to reach your inbox.
I actually implemented this approach on my site through the use of a hidden form element:
Then, on submission:
1 2 3 4 5 6 7
Unfortunately, while this approach did block some spambots, I was still receiving a few spam submissions on occasion. I’m not sure if this is because the bots were taking a while to submit, or if they were smarter (i.e., modifying the hidden element in some way).
Rather than investigate it further, I decided to supplement my efforts with another heuristic.
Assumption 3: Spammers waste no opportunity to fill in links and text promoting their products
This is what lead me to the “honeypot” approach. For those who aren’t familiar: in the world of computer security a honeypot is a type of trap that is intentionally disguised to look very attractive to the nefarious, while (ideally) not impacting the innocent users of a system.
So, for my honeypot, I decided to create a field in my contact form with the label of “website.”
And really: what spammer could possibly resist filling in such an inticing, targeted field for their awful scam sites?
With this, the spam detection heuristic is simple: if there is anything written in the “website” field, then the entire submission is spam and should be discarded.
But what if a legitimate user decides to fill out this field? We will be discarding perfectly valid submissions! Well, we could write some text telling the user to not write anything in that field, but that hardly seems elegant…
After using both of the above techniques I have seen 100% of the spam stemming from my website’s contact form disappear—and without having to resort to CAPCHAs!