parp - Perl-based Anti-spam Replacement for Procmail
parp is a powerful, extensible,
hackerware e-mail filter with sophisticated anti-spam capabilities.
It's written in Perl, so theoretically it can be run on about 70
different platforms. However, it was written with UNIX systems in
mind, and so far has only been tested on RedHat Linux.
N.B. The user configuration file is written in
pure Perl. Please understand this means that unless you have
a certain degree of experience in Perl or similar programming
languages, you currently stand very little chance of being able to use
the filter. That's what I meant when I said it was
hackerware. If you're in any doubt as to whether you'll be able to
"hack it" (literally), take a look at the sample
configuration file provided. If it looks fairly understandable
then chances are you can use parp. If not, you're welcome to try
anyway, but I can't be held responsible for the consequences! (Not
that I can anyway.)
Also please note: the filter is currently working beautifully for me,
but has had next to no external testing. If you let it loose on your
own e-mail, make sure you take the necessary precautions!
This was yet another
personal itch which needed scratching. I receive between 5 and 20
spam e-mails most days. It wasn't only mildly annoying to have to hit
delete more than normal, but I also forward all e-mail which ends up
in my main inbox to my mobile (cell) phone via email2sms and an Internet/SMS gateway, and I
was sick to death of my phone bleeping a lot through the day
purely due to junk mail.
I started looking at all the available anti-spam filters. Over a
period of two years, I looked at many, including the NAGS filter,
despam, various complex anti-spam procmailrcs, the spamometer,
blackmail, filter.plx, zfilter, spamstop, junkfilter ... but various
things put me off all of them:
- Some weren't written in Perl. Call me a Perl bigot, but if
there was ever a case of Perl being the right tool for
the job, it's an e-mail filter. Extensibility and
maintainability were very high on my list.
- Some were terribly coded. I refuse to put my e-mail at the
mercies of bad code (and that includes sendmail ;-).
- Some insisted that you use a particular MDA or MUA. I have no
intentions of changing from mutt and qmail.
- Many filtered on only the headers, or only the body. I want to
filter on both, not all the time, but in some circumstances.
- None were as accurate as I wanted. My goal was at least 99%
accuracy. (At the time of writing, parp's accuracy is hovering
around the 99.8% mark.)
So, I resolved to write my own anti-spam filter, taking the best
features of all the others. As I started, I realised that it wouldn't
be that much more effort to scrap my .procmailrc and rewrite it as
part of this filter, and it would be well worth the power gained.
Besides, I was fed up of procmail's rather clumsy configuration
language, and I missed not being able to use Perl's regular
expression syntax.
- Can act as a filter in a similar manner to procmail, or
directly on files in Mbox format. All standard filtering
actions are available.
- Highly sophisticated spam detection heuristics: currently
around 40 different tests performed in a worst case scenario,
although all tests optimised for speed (e.g. fast tests
performed on headers, then slower tests only performed on body
if necessary).
- Optional cross-checking with the ordb.org.
- Filter adds X-Parp-Accepted: and X-Parp-Rejected: headers so
that you can easily monitor its filtering strategy without
leaving your mail reader.
- MIME multi-part aware, e.g. will not be confused by binary
attachments.
- Berkeley DB format friends database, for keeping false positives to
an absolute minimum.
- Friends' address extraction mode for easily making new friends.
The friends database is easily editable with my dbm utility.
- Other `grace' tests allowing bona fide persons' communications
through (e.g. passworded e-mails) just in case all the other
tests go badly wrong.
- The configuration file is written in
raw Perl, so you can extend the filter arbitrarily using the
main program's API.
- Comprehensive logging and error-trapping systems.
- Auxiliary program to print out comprehensive statistics on all
aspects of filtering (see the sample
output).
- Ability to log false positives/negatives when spam detection
has gone wrong in a way which can be interpreted by the
statistics program to determine the filter's current accuracy
of spam detection.
- Mostly RFC822-compliant state machine
parser of Received headers, enabling extensive spam trace
analysis and retaliative action. Read its man page or source if you're curious.
- Duplicate removals (by message id).
- No decent documentation yet.
- Requires some knowledge of Perl / programming. (Ironically, if
it didn't, there would be far greater limitations to the
filter's flexibility.)
This has been moved into the TODO file in the
release tarball.
You'll need Perl 5.005 or later, and the following Perl
modules, available from CPAN:
- Digest::MD5 (for calculating a unique message ID)
- Mail::Box
- Mail::Filter
- Mail::Address
- Mail::Internet
- Mail::Field
- Mail::Field::Received (this one also available locally)
- Net::DNS (only if you want to enable RSS cross-checking)
Here are brief instructions (hey, I did say it was hackerware):
- Ensure you have the prerequisites installed.
- Edit the configuration file MyFilter.pm.sample to suite
your own filtering needs, and save it as ~/.parp/MyFilter.pm
(or elsewhere if you know how to instruct Perl to find it).
- Check that parp happily compiles by running it with the
-h option.
- Invoke parp in test mode (the
-t option) as a filter (taking one
e-mail from STDIN) or on one or more folders (with the
-f option).
- Put something appropriate in your .forward or
.qmail, or do whatever else is necessary to get
your MDA using parp as a filter. Don't forget to enable the
nice extras, like the -d option for
discarding duplicates, or the -r option
for RSS cross-checking.
- Send yourself some test e-mails to make sure they're getting
dealt with as you'd wish.
- Enjoy an almost spam-free life immediately.
- If the filter mistakenly classifies a spam as bona-fide, or
vice-versa, invoke the filter on that e-mail with the
-w option. (Binding a key press in your
favourite e-mail reader to pipe a selected e-mail to
parp -wv is recommended.) This writes a
comment to the logfile that the filter got it wrong, which can
be used by the accompanying statistics program.
- Tweak the configuration file every time you notice the filter
getting something wrong. Bind a key press in your mail reader
to pipe a selected e-mail to parp -vt.
Test your new configuration easily using the above key press.
There isn't currently any proper documentation. However, as noted
above, you probably shouldn't be using this filter unless you have
some experience as a (Perl) programmer, in which case you should
hopefully find the code very readable. In particular, search for the
line containing sub categorize {
. This routine contains
the heart of the filter's categorization (including spam detection)
strategy. Also search for the line sub filter {
, as this
summarises all action taken for each e-mail filtered.
As with all my software, all suggestions / bug
reports / patches (unified or context diff only please) are very
welcome; please contact me by e-mail.
Last updated: Thu Nov 22 14:12:06 2001
© 1995-2001
Adam Spiers <adam@spiers.net>