Ten Spam-Filtering Methods Explained
Learn how different spam-fighting techniques work
By: Brian Satterfield
November 30, 2006
This article was originally posted on TechSoup US.
While new computer security threats may come and go, spam remains a constant nuisance for nonprofits. At a minimum, spam can interrupt your busy days, forcing you to spend time opening and deleting emails hawking herbal remedies or once-in-a-lifetime investment opportunities. In a more serious scenario, spam could unleash a nasty virus on your organization's network, crippling your servers and desktop machines.
Experts and anti-spam services tend to peg the rate of spam at anywhere from 50 to 90 percent of all emails on the Internet. Although preventing tenacious spammers from sending junk mail may never be possible, installing an anti-spam application on your organization's mail server or individual computers can vastly reduce the amount of spam your staffers have to deal with. Anti-spam applications typically use one or more filtering methods to identify spam and stop it from reaching a user's inbox. But just because anti-spam programs are designed to do the same job doesn't mean they all go about it in the same way.
For instance, some spam-filtering methods run a series of checks on each message to determine the likelihood that it is spam. Other spam-filtering techniques simply block all email transmissions from known spammers or only allow email from certain senders. And while some spam-filtering methods are completely transparent to both the sender and recipient, others require some degree of user interaction.
Whether your nonprofit plans to implement its first anti-spam solution or simply seeks a more effective application than the one you currently use, familiarizing yourself with common anti-spam methods can help you decide which products to investigate more closely. To help you in your research, we'll explain how 10 popular anti-spam methods work and briefly outline some of their pros and cons.
As you read the descriptions of the spam-filtering methods below, start thinking about which techniques you want — or don't want — your anti-spam application to use. Consider factors such as the scope of your current spam problem and how much work users at your nonprofit are willing to do to stop unwanted email. You may find it helpful to print this article and circle methods that interest you with a pen; that way, when you actually start to research particular products, you'll have a shortlist of desired filtering techniques.
List-based filters attempt to stop spam by categorizing senders as spammers or trusted users, and blocking or allowing their messages accordingly.
This popular spam-filtering method attempts to stop unwanted email by blocking messages from a preset list of senders that you or your organization’s system administrator create. Blacklists are records of email addresses or Internet Protocol (IP) addresses that have been previously used to send spam. When an incoming message arrives, the spam filter checks to see if its IP or email address is on the blacklist; if so, the message is considered spam and rejected.
Though blacklists ensure that known spammers cannot reach users' inboxes, they can also misidentify legitimate senders as spammers. These so-called false positives can result if a spammer happens to be sending junk mail from an IP address that is also used by legitimate email users. Also, since many clever spammers routinely switch IP addresses and email addresses to cover their tracks, a blacklist may not immediately catch the newest outbreaks.
Real-Time Blackhole List
This spam-filtering method works almost identically to a traditional blacklist but requires less hands-on maintenance. That’s because most real-time blackhole lists are maintained by third parties, who take the time to build comprehensive blacklists on the behalf of their subscribers. Your filter simply has to connect to the third-party system each time an email comes in, to compare the sender’s IP address against the list.
Since blackhole lists are large and frequently maintained, your organization's IT staff won't have to spend time manually adding new IP addresses to the list, increasing the chances that the filter will catch the newest junk-mail outbreaks. But like blacklists, real-time blackhole lists can also generate false positives if spammers happen to use a legitimate IP address as a conduit for junk mail. Also, since the list is likely to be maintained by a third party, you have less control over what addresses are on — or not on — the list.
A whitelist blocks spam using a system almost exactly opposite to that of a blacklist. Rather than letting you specify which senders to block mail from, a whitelist lets you specify which senders to allow mail from; these addresses are placed on a trusted-users list. Most spam filters let you use a whitelist in addition to another spam-fighting feature as a way to cut down on the number of legitimate messages that accidentally get flagged as spam. However, using a very strict filter that only uses a whitelist would mean that anyone who was not approved would automatically be blocked.
Some anti-spam applications use a variation of this system known as an automatic whitelist. In this system, an unknown sender's email address is checked against a database; if they have no history of spamming, their message is sent to the recipient's inbox and they are added to the whitelist.
A relatively new spam-filtering technique, greylists take advantage of the fact that many spammers only attempt to send a batch of junk mail once. Under the greylist system, the receiving mail server initially rejects messages from unknown users and sends a failure message to the originating server. If the mail server attempts to send the message a second time — a step most legitimate servers will take — the greylist assumes the message is not spam and lets it proceed to the recipient's inbox. At this point, the greylist filter will add the recipient's email or IP address to a list of allowed senders.
Though greylist filters require fewer system resources than some other types of spam filters, they also may delay mail delivery, which could be inconvenient when you are expecting time-sensitive messages.
Rather than enforcing across-the-board policies for all messages from a particular email or IP address, content-based filters evaluate words or phrases found in each individual message to determine whether an email is spam or legitimate.
A word-based spam filter is the simplest type of content-based filter. Generally speaking, word-based filters simply block any email that contains certain terms.
Since many spam messages contain terms not often found in personal or business communications, word filters can be a simple yet capable technique for fighting junk email. However, if configured to block messages containing more common words, these types of filters may generate false positives. For instance, if the filter has been set to stop all messages containing the word "discount," emails from legitimate senders offering your nonprofit hardware or software at a reduced price may not reach their destination. Also note that since spammers often purposefully misspell keywords in order to evade word-based filters, your IT staff will need to make time to routinely update the filter's list of blocked words.
Heuristic (or rule-based) filters take things a step beyond simple word-based filters. Rather than blocking messages that contain a suspicious word, heuristic filters take multiple terms found in an email into consideration.
Heuristic filters scan the contents of incoming emails and assigning points to words or phrases. Suspicious words that are commonly found in spam messages, such as "Rolex" or "Viagra," receive higher points, while terms frequently found in normal emails receive lower scores. The filter then adds up all the points and calculates a total score. If the message receives a certain score or higher (determined by the anti-spam application's administrator), the filter identifies it as spam and blocks it. Messages that score lower than the target number are delivered to the user.
Heuristic filters work fast — minimizing email delay — and are quite effective as soon as they have been installed and configured. However, heuristic filters configured to be aggressive may generate false positives if a legitimate contact happens to send an email containing a certain combination of words. Similarly, some savvy spammers might learn which words to avoid including, thereby fooling the heuristic filter into believing they are benign senders.
Bayesian filters, considered the most advanced form of content-based filtering, employ the laws of mathematical probability to determine which messages are legitimate and which are spam. In order for a Bayesian filter to effectively block spam, the end user must initially "train" it by manually flagging each message as either junk or legitimate. Over time, the filter takes words and phrases found in legitimate emails and adds them to a list; it does the same with terms found in spam.
To determine which incoming messages are classified as spam, the Bayesian filter scans the contents of the email and then compares the text against its two-word lists to calculate the probability that the message is spam. For instance, if the word "valium" has appeared 62 times in spam messages list but only three times in legitimate emails, there is a 95 percent chance that an incoming email containing the word "valium" is junk.
Because a Bayesian filter is constantly building its word list based on the messages that an individual user receives, it theoretically becomes more effective the longer it's used. However, since this method does require a training period before it starts working well, you will need to exercise patience and will probably have to manually delete a few junk messages, at least at first.
Other Filtering Methods
In addition to list- and content-based filtering techniques, some anti-spam applications employ one or more additional methods.
Filters that use a challenge/response system block undesirable emails by forcing the sender to perform a task before their message can be delivered. For instance, if you send an email to someone who’s using a challenge/response filter, you’ll likely receive an email right back that asks you to visit a Web page and enter the code displayed there into a form. If you successfully complete this task, your email (and all future emails) will be delivered to the recipient. If you don’t complete the challenge after a certain time period, the message is rejected.
This system works to fight spam because the "challenge" is typically only one that a human can solve. Spammers usually rely on automated mailing programs to send out millions of emails at once, and they rarely check to see what emails come back in response. And even if they did see a challenge message, they aren't likely to respond and risk revealing themselves as a spammer.
However, challenge/response filters might also block email newsletters you subscribe to, as these messages are typically sent by automated programs. Another downside is that some of your organization's constituents may not take the time to complete the challenge or may not understand the challenge email, meaning that their messages will not reach the recipient. And there's always the slight chance that if both the sender and recipient are using challenge/response systems, their anti-spam applications will continue to challenge each other, locking the email in an undeliverable loop.
Collaborative content filtering takes a community-based approach to fighting spam by collecting input from the millions of email users around the globe. Users of these systems can flag incoming emails as legitimate or spam and these notations are reported to a central database. After a certain number of users mark a particular email as junk, the filter automatically blocks it from reaching the rest of the community's inboxes.
When a collaborative content filtering system involves a large, active user base, it can quickly quell a spam outbreak, sometimes within a matter of minutes. One potential downside to the collaborative-content method is that if a group of spammers mobilise in large numbers and pretend to be legitimate users of the system, they could skew results by falsely labeling spam emails as legitimate messages.
DNS Lookup Systems
While not a particularly reliable method on its own, several anti-spam methods use the domain name system (DNS) — which all mail servers on the Internet use to identify themselves — to identify and foil spammers.
DNS Mail Exchange (MX) attempts to verify that the domain name in the email address of the sender — the part after the at symbol (@) — exists. It does this by searching the domain name system to see whether the domain name has a valid MX record, which indicates the presence of a real mail server; if there's no match, the anti-spam program assumes that the message is junk. A filter will also perform a reverse DNS lookup using the IP address off the mail server that sent the questionable message. This lookup will reveal the domain name associated with the server.
While DNS lookups can be useful in weeding out emails from spammers attempting to disguise themselves, they are not as effective or reliable on their own (when compared to other spam-fighting methods) in stopping general junk mail. In particular, reverse DNS lookups have been known to produce false positives — legitimate messages marked as spam — since it's technically possible that legitimate senders can send email from a domain different from their own.
Researching Spam-Filtering Products
Now that you know how different anti-spam methods go about stopping unwanted email, you'll be better prepared when it comes time to research the many products on the market. Since none of the aforementioned anti-spam methods are 100 percent foolproof, you may want to seek out a product that uses several different spam-fighting methods; doing so further decreases the amount of junk mail your organization will have to deal with.
To see how 41 commercial anti-spam products stack up, read Network World's article Spam in the Wild: The Sequel, which includes information on features and performance-test results. A WinPlanet article called Anti-Spam Round-Up: Stop the Spam-sanity also offers reviews of five well-known commercial applications. If your nonprofit's budget is too tight to consider purchasing anti-spam software, you might examine a few of the utilities listed in About.com's article Top 10 Free Windows Spam Filters.