parp - Perl-based Anti-spam Replacement for Procmail

parp is a powerful, extensible, hackerware e-mail filter with sophisticated anti-spam capabilities. It's written in Perl, so theoretically it can be run on about 70 different platforms. However, it was written with UNIX systems in mind, and so far has only been tested on RedHat Linux.

N.B. The user configuration file is written in pure Perl. Please understand this means that unless you have a certain degree of experience in Perl or similar programming languages, you currently stand very little chance of being able to use the filter. That's what I meant when I said it was hackerware. If you're in any doubt as to whether you'll be able to "hack it" (literally), take a look at the sample configuration file provided. If it looks fairly understandable then chances are you can use parp. If not, you're welcome to try anyway, but I can't be held responsible for the consequences! (Not that I can anyway.)

Also please note: the filter is currently working beautifully for me, but has had next to no external testing. If you let it loose on your own e-mail, make sure you take the necessary precautions!

Motivation and goals

This was yet another personal itch which needed scratching. I receive between 5 and 20 spam e-mails most days. It wasn't only mildly annoying to have to hit delete more than normal, but I also forward all e-mail which ends up in my main inbox to my mobile (cell) phone via email2sms and an Internet/SMS gateway, and I was sick to death of my phone bleeping a lot through the day purely due to junk mail.

I started looking at all the available anti-spam filters. Over a period of two years, I looked at many, including the NAGS filter, despam, various complex anti-spam procmailrcs, the spamometer, blackmail, filter.plx, zfilter, spamstop, junkfilter ... but various things put me off all of them:

So, I resolved to write my own anti-spam filter, taking the best features of all the others. As I started, I realised that it wouldn't be that much more effort to scrap my .procmailrc and rewrite it as part of this filter, and it would be well worth the power gained. Besides, I was fed up of procmail's rather clumsy configuration language, and I missed not being able to use Perl's regular expression syntax.

Features

Limitations

Wishlist

This has been moved into the TODO file in the release tarball.

Prerequisites

You'll need Perl 5.005 or later, and the following Perl modules, available from CPAN:

Download

Installation

Here are brief instructions (hey, I did say it was hackerware):

  1. Ensure you have the prerequisites installed.
  2. Edit the configuration file MyFilter.pm.sample to suite your own filtering needs, and save it as ~/.parp/MyFilter.pm (or elsewhere if you know how to instruct Perl to find it).
  3. Check that parp happily compiles by running it with the -h option.
  4. Invoke parp in test mode (the -t option) as a filter (taking one e-mail from STDIN) or on one or more folders (with the -f option).
  5. Put something appropriate in your .forward or .qmail, or do whatever else is necessary to get your MDA using parp as a filter. Don't forget to enable the nice extras, like the -d option for discarding duplicates, or the -r option for RSS cross-checking.
  6. Send yourself some test e-mails to make sure they're getting dealt with as you'd wish.
  7. Enjoy an almost spam-free life immediately.
  8. If the filter mistakenly classifies a spam as bona-fide, or vice-versa, invoke the filter on that e-mail with the -w option. (Binding a key press in your favourite e-mail reader to pipe a selected e-mail to parp -wv is recommended.) This writes a comment to the logfile that the filter got it wrong, which can be used by the accompanying statistics program.
  9. Tweak the configuration file every time you notice the filter getting something wrong. Bind a key press in your mail reader to pipe a selected e-mail to parp -vt. Test your new configuration easily using the above key press.

Documentation

There isn't currently any proper documentation. However, as noted above, you probably shouldn't be using this filter unless you have some experience as a (Perl) programmer, in which case you should hopefully find the code very readable. In particular, search for the line containing sub categorize {. This routine contains the heart of the filter's categorization (including spam detection) strategy. Also search for the line sub filter {, as this summarises all action taken for each e-mail filtered.

Feedback

As with all my software, all suggestions / bug reports / patches (unified or context diff only please) are very welcome; please contact me by e-mail.


Last updated: Thu Nov 22 14:12:06 2001
© 1995-2001 Adam Spiers <adam@spiers.net>