August 7, 2006

More greylisting details

Historical

Rob Mueller

Founder & CTO

I thought I’d take some of the things posted in the forum threads here and here and put them into one blog post.

There were actualy two separate policies implemented at the same time: greylisting and “address enumeration detection” (We’ll call it AED). Greylisting is a method designed to stop spam being accepted from the large number of zombie computers that are connected to the internet. AED is designed to stop other people trying to find what addresses are valid email addresses at FastMail.

One of the main concerns with greylisting is that naive implementations will often delay all email. In our implementation we’ve gone to great lengths to ensure that this doesn’t happen.

We only greylisting hosts that appear to be dialup/dsl hosts of some
sort, or hosts that don’t have any valid reverse DNS. This ensures
that the vast majority of email servers are immediately not subject
to greylisting, and their email is not delayed
If a host has been greylisted, and it successfully passes
greylisting twice in a 24 hours period (e.g. it correctly attempts
to re-deliver a piece of email twice in 24 hours), then that host is
whitelisted and not subject to greylisting for the next 24 hours. If
it continues to deliver emails, each new delivery will extend the
whitelist period. This means that any real email servers (a real
email server will always retry) connected via dialup/dsl will
quickly be whitelisted and not subject to email delays
If a host opens an SMTP session with a HELO that is not an IP
address, is not the same reverse DNS as it’s connecting IP, but the
forward DNS of the name does resolve to the connecting IP, then that
host is not subject to greylisting. (As suggested by hadaso on the
forum)An example: The machine at IP 206.223.169.73 connects to us.
The reverse DNS for 206.223.169.73 is 206-223-169-73.beanfield.net,
which looks like a common dialup/dsl IP name, and would be a
candidate for greylisting. However, the machine advertises itself to
us with a “HELO mx3.hub.org” line. Doing a forward lookup of
mx3.hub.org gives the IP 206.223.169.73, which is the same as the
connecting IP, so we exclude it from greylisting.

When combined, these features provide an excellent balance of greylisting hosts which should not be sending email, and allowing those hosts which should be sending email to get their email straight through.

Additionally, to help with the tracking of any problems, once a message passes greylisting and is accepted, a new header is added “X-Spam-greylist”. This header tells you how many seconds the email was delayed and whether that host has been whitelisted for 24 hours. (Technical: Well, actually the delay figure is the how long the last delay for the ip/sender/recipient combination was, so in the case of multiple emails from the same person, to the same person, from the same machine in a short time period, the figure will be a bit messy and hard to calculate).

All up, we now have 4 lines of defense against spam at the moment:

RBLs (dsbl/xbl) - all users
Greylisting - all users
SpamAssassin -
full/enhanced users
Backscatter
detection -
full/enhanced users

The combination of these 4 things provides an extremely strong defense against spam with absolutely no user interaction at this point. We also hope later this year to add per-user bayes databases, which will allow per-user training of a statistical database to catch the final spams that make it through all these filters.