Joe Wein
Fighting spam and scams
on the Internet

Home / Blog / About us
Spam
419/Nigeria
Online fraud
jwSpamSpy
Contact

Email Spam Filter:
jwSpamSpy
Try it for free!

Google
 

Using our spam blacklists

Introduction

Since October 2003 we have been publishing spamvertized domain and 419 email sender addresses. The lists include domain names and email addresses extracted from spam by jwSpamSpy, our spamfilter. We feed data into SURBL which maintains lists that are amongst the most comprehensive and accurate of their kind.

Despite the high volume of additions we have maintained an extremely low error rate, a fact we take pride in. This low error rate is due to a conservative blacklisting policy (see here for more details) and manual inspection. We also handle email inquiries about them. Data published here was for research and non-commercial use. As of 2022-08-13 the domain blacklist is no longer avialable on this website. The download URIs marked as "(DEPRECATED)" are where the data was previously published.

If you are a commercial user, such as a corporate user or security vendor and are interested in our data: Please license SURBL data through Securityzones, its authorized reseller. You will receive real-time updates, a wider set of data and data in different formats (rbldnsd, bind, CSV, RPZ, etc).

Download URLs of plain text files
Here are plaintext versions of our blacklists. The domain blacklist consists of two files, the 419 blacklist of one file:

MD5 checksums
The following very small files contain hash codes computed from the above files. You can download the following files every hour or even every 15 minutes (make sure your script works properly before you try this rate!) and then run "md5sum -c filename" on each one. If the checksum fails it means the corresponding data file has changed and it's time to download it as well. That way you will never download copies of the actual data files unless they have changed.

Blacklisting policy
We are aiming primarily at blacklisting domains that have no legitimate uses. There are a number of domains that have questionable privacy policies or no confirmed opt-in (closed loop) subscription process and are often reported as spam that we don't list, because some people do indeed subscribe to their sites.

The current blacklisting procedure has been in place since December 2003. All entries added to the list before that have been purged. Our false positive rate is less than one per month, which means an error rate below 0.01%. None of these have been widely used domains. Here are the main points about our process:

  • We are trying to be conservative in our blacklisting. We recognize that false positives are far more painful and costly than false negatives. That means: If in doubt, don't blacklist. Use built-in checks and double check whatever you can.

  • We don't blacklist on hearsay. Every entry is backed by at least one evidence email originally sent to our mailboxes or to customers of an ISP we're cooperating with. We recognize that there are Joe jobs, fake sender addresses and innocent bystanders mentioned in spam. We make efforts to detect these cases.

  • In order to minimize false positives, we start out with a pre-selected set of messages. Many of the mails we receive at our domains go to largely or completely unused accounts that we don't sign up for anything. Furthermore, unless these mails meet certain criteria, our spamfilter won't even look at the embedded domain names. At our partner ISP every mail has to reach a certain SpamAssassin score before our filter gets to take a look at it.

  • Every mail then goes through our in-house spam filter, which extracts domains names, makes WHOIS queries and together with other data about the original mails, stores the information in a database. It sorts domains by perceived spamminess, taking into account factors such as domain age, registrar, supporting name servers, Spamhaus SBL records for related servers, etc.

  • Domains registered by a fixed small set of hardcore spammers such as for many of the "OEM software" and pharmaceutical spams are automatically detected and blacklisted.

  • Other domains get sorted into several bins for manual inspection. This is where it gets labour-intensive. We generally discard the least suspicious domains because there's too much of the more interesting stuff.

  • For the more suspicious ones we look at the reasons the filter didn't like the mail, the sender, the subject, we check WHOIS info, the actual message itself, we perform Google web searches, Google NANAS lookups, etc. We look for signs mail for third parties might have been legitimate and subscribed to or - the opposite - for signs of obfuscation to defeat filters, in order to determine if it may be a legitimate newsletter or not. This is not always easy if the recipient is a third party, but there are certain patterns that can be detected.

  • The older a domain, the more evidence we require to list it. SBL listings are a strong indicator but not sole determinator of spamminess. We always judge several factors in combination.

  • We don't currently have a process for purging discarded spam domains, but are working on that.

  • If a listing is challenged, we provide information about the email that triggered the listing, but without identifying the mailbox it was sent to. If the listing appears to be because of a mistake or if we think it is unlikely the domain will appear in spam again we remove the listing.