Don’t Spam Me There

I‘m working on a segment about fighting spam for The Screen Savers and I’d like some input.
MAPSI use a two step approach to fight junk email that seems to work pretty well. My first line of defense is through my ISP, sonic.net. Like many ISPs, Sonic offers spam filtering on the mail server using an open source program called SpamAssassin. SpamAssassin is a hefty Perl script which contains multiple rule sets. Each message is run through the program which scores it based on these rules. As the end-user I set a score threshold. Emails which score too high are held on the server and never touch my inbox. Set the threshold too high and extra spam gets through. Set it too low, and you’ll get false positives, the bane of spam filtering.

After playing around with the settings I’ve found that a threshold score of 6.5 stops 90% of my spam and never stops mail I want. SpamAssassin kills an average of 120 spams a day on my main account. That’s several megabytes of hair restoration ads I never have to download. I review the spam mailbox every few days to make sure it hasn’t trapped anything I want, and after several months of operation I’ve found it to be quite reliable.

But what of the 10% of spam messages that sneak by SpamAssassin? For that I use client-side filtering. I do all my email on Mac OS X using a streamlined and powerful program called PowerMail. I use an add-on spam filter called SpamSieve by Michael Tsai with PowerMail. It also works with MailSmith, Apple Mail, and Entourage. SpamSieve uses a new technique to detect spam called “Bayesian filtering.” Bayesian analysis of text has been around for years. As far as I can tell, Paul Graham was the first to propose its use in fighting junk email in his article “A Plan For Spam”. (Do read the article – it’s the best explanation of the issues in fighting spam I’ve ever read. And make sure to check the links at the bottom.)

As Graham points out, most spam filters end up working like pesticides. They simply breed smarter spammers. Because Bayesian based filtering techniques continue to learn and evolve, they can be expected to keep up with spammers. I’ve not found that to be completely true, but once you’ve fed 500 or so good and bad messages to the filter, it does do a very good job of detecting the bad stuff. According to SpamSieve’s own statistics on my machine it has processed 8018 spam messages and 48,195 good messages with a 98.7% accuracy rate. In other words, it only missed 572 penis enlarger ads, and incorrectly marked 186 messages from my mom as spam. (Which might be the first time “penis enlarger” and “my mom” have ever appeared together in a sentence.) That’s still 186 false positives too many, but it’s the best I’ve found to date. SpamSieve is particularly accurate with mailing lists. Many spam filters incorrectly tag newsletters as spam. I subscribe to several dozen lists. Thanks to the combination of SpamAssassin and SpamSieve I haven’t missed any issues.

SpamAssassin uses a combination of Bayesian techniques, rule-based filters, and white and blacklists to do its job. Its developers are constantly fiddling with the rules, so it seems to keep up pretty well with the spammers. Why is it spammers try so hard to get past mail filters? Clearly if I’m filtering on the word Viagra, I don’t want to see messages about it. What’s the point in spelling it V i a g r a? Maybe it’s because most spammers aren’t trying to sell anything at all. According to an interesting study by Wired News, most spam is designed merely to harvest your email address. That’s why you should never reply to spam – even to complain.

Since I don’t use Windows to read email any more, I don’t have much experience with Windows-based spam filters, Bayesian or otherwise. I’ve been waiting for my buddy Mark Thompson to ship his long-awaited Spambo. He showed me a pre-release version during the Call for Help-a-thon in December, and it looked pretty amazing, but for some reason he’s holding on to it. Meanwhile I’ve been trying a Bayesian-based program called Ella from Open Field Software. It’s an Outlook plug-in and it does a fairly good job. Ultimately Open Field plans to release it as an automatic mail categorizer, much like John Graham-Cumming’s POPFile. We’ve featured John and POPFile on the show, but many users report that they find it a little confusing to set up. I’m going to give it a try this weekend and I’ll let you know.

I also subscribe to SpamCop but I no longer use it to filter my mail. For $30 a year you can run all your incoming mail through Spamcop before it hits your inbox. If your ISP doesn’t offer SpamAssassin, this might make a good alternative, but I found that Spamcop was stopping too much legitimate mail especially mailing lists. I do use my Spamcop email address whenever I have to give an address to a web site, however. I find that the @spamcop.net address alone seems to be enough to deter them from selling my name.

I’d like to review some other solutions, too, but there are so many spam filters for Windows I hardly know where to start. That’s where I need your help. Which spam fighters have you tried, and what has been your experience with them? Please add your thoughts to the comments section here and I will be glad to credit you in the final article for The Screen Savers web site.