Probably Won’t Work

2003 December 2
by Kabitzin

I found this interesting article about how spammers are trying to trick Bayesian spam filters by using blocks of random text or lines from books in the spam mail. While it seems like a good idea, this seriously makes the message a lot less effective for the few that actually open the spam mail. More importantly, it demonstrates that spammers did not take the time to actually read about how Bayesian spam filters work. I had posted about this issue before, since I clearly feel very strongly about all things Bayes and spam-fighting, and the article clearly points out two great weaknesses in using this new method of defeating a Bayesian filtering algorithm. The first problem is that a Bayesian algorithm generally only considers the 15 or so most significant words in the email. So putting in random words and such, will not necessarily dilute the spam. Secondly, any good Bayesian filter considers many more factors than just the text in the body of the message. The header information in an email is often much more damning than the body, and all of these factors simply contribute to a portion of the probability that the email is spam. Moreover, over time, these tactics should be fairly easily defeated with proper training of the filter, since any insanely long email comprised of only text will stand out. Bwahaha, probability is not so easily defeated. I should know, since I had to take all sorts of horrible classes about it at Cornell… x__X

The Penny Arcade Christmas project seems to be going well. It really seems like a worthy cause, and would you just look at all those smiling boxes! In addition to being a great project, it would be nice if Child’s Play got others to realize that gamers care, too.

Related posts:

  1. Keeping Spammers at Baye(sian)
  2. Not Bayesian This Time
  3. Beta Mail

No comments yet

Leave a Reply

Note: You can use basic XHTML in your comments. Your email address will never be published.

Subscribe to this comment feed via RSS