Index of /computing/parp/files
  
<#title>parp - Perl-based Anti-spam Replacement for Procmail#title>
<#def name="samples">files/sample-confs#def>
parp - Perl-based Anti-spam Replacement for Procmail
 <#exe>parp#exe> is a powerful, extensible,
hackerware e-mail filter with sophisticated anti-spam capabilities.
It's written in <#perl>, so theoretically it can be run on about 70
different platforms.  However, it was written with UNIX systems in
mind, and so far has only been tested on RedHat Linux.
 N.B. The user configuration file is written in
pure Perl.  Please understand this means that unless you have
a certain degree of experience in Perl or similar programming
languages, you currently stand very little chance of being able to use
the filter to its full potential.  That's what I meant when I
said it was hackerware.  If you're in any doubt as to whether you'll
be able to "hack it" (literally), take a look at the sample configuration files provided.  If they
look fairly understandable then chances are you can use parp.  If not,
you're welcome to try anyway, but I can't be held responsible for the
consequences!  (Not that I can anyway.)
Also please note: the filter is currently working beautifully for me,
but has had next to no external testing.  If you let it loose on your
own e-mail, make sure you take the necessary precautions!
 This was yet another 
personal itch which needed scratching.  I receive between 5 and 20
spam e-mails most days.  It wasn't only mildly annoying to have to hit
delete more than normal, but I also forward all e-mail which ends up
in my main inbox to my mobile (cell) phone via email2sms and an Internet/SMS gateway, and I
was sick to death of my phone bleeping a lot through the day
purely due to junk mail.
 I started looking at all the available anti-spam filters.  Over a
period of two years, I looked at many, including the NAGS filter,
despam, various complex anti-spam procmailrcs, the spamometer,
blackmail, filter.plx, zfilter, spamstop, junkfilter ... but various
things put me off all of them:
  -  Some weren't written in <#perl>.  Call me a Perl bigot, but if
       there was ever a case of Perl being the right tool for
       the job, it's an e-mail filter.  Extensibility and
       maintainability were very high on my list.
  
-  Some were terribly coded.  I refuse to put my e-mail at the
       mercies of bad code (and that includes sendmail ;-).
  
-  Some insisted that you use a particular MDA or MUA.  I have no
       intentions of changing from mutt and qmail.
  
-  Many filtered on only the headers, or only the body.  I want to
       filter on both, not all the time, but in some circumstances.
  
-  None were as accurate as I wanted.  My goal was at least 99%
       accuracy.  (At the time of writing, parp's accuracy is hovering
       around the 99.8% mark.)
 So, I resolved to write my own anti-spam filter, taking the best
features of all the others.  As I started, I realised that it wouldn't
be that much more effort to scrap my .procmailrc and rewrite it as
part of this filter, and it would be well worth the power gained.
Besides, I was fed up of procmail's rather clumsy configuration
language, and I missed not being able to use <#perl>'s regular
expression syntax.
  -  Can act as a filter in a similar manner to procmail, or
       directly on files in Mbox format (and possibly other formats
       via <#pm>Mail::Box#pm> - untested), or as a daemon processing
       mails from a spool.  In the latter case, mails are injected
       into the queue via a tiny (15k on my system) executable which
       handles locking correctly.
  
-  Standard filtering actions are available (deliver to
       mailbox, pipe to command, reject as junk etc.)
  
-  Highly sophisticated spam detection heuristics: currently
       around 40 different tests performed in a worst case scenario,
       although all tests optimised for speed (e.g. fast tests
       performed on headers, then slower tests only performed on body
       if necessary).  N.B. I'm considering incorporating the SpamAssassin ruleset at some
point too.
  
-  Optional cross-checking with the Open Relay Database.
  
-  Filter adds X-Parp-Accepted: and X-Parp-Rejected: headers so
       that you can easily monitor its filtering strategy without
       leaving your mail reader.
  
-  MIME multi-part aware, e.g. will not be confused by binary
       attachments.
  
-  Berkeley DB format friends database, for keeping false positives to
       an absolute minimum.
  
-  Automatic extraction of addresses into the friends database
       from emails which pass the spam tests.  Semi-automatic removal
       of addresses from the friends database on the rare occasions
       parp gets it wrong.  The friends database is also easily
       editable with my dbm utility.
  
-  Other `grace' tests allowing bona fide persons' communications
       through (e.g. passworded e-mails) just in case all the other
       tests go badly wrong.
  
-  The configuration files
       are written in raw <#perl>, so you can extend the filter
       arbitrarily using the main program's API.
  
-  Comprehensive logging and error-trapping systems.
  
-  Auxiliary program to print out comprehensive statistics on all
       aspects of filtering (see the sample
       output).
  
-  Ability to log false positives/negatives when spam detection
       has gone wrong in a way which can be interpreted by the
       statistics program to determine the filter's current accuracy
       of spam detection.
  
-  Mostly RFC822-compliant state machine
       parser of Received headers, enabling extensive spam trace
       analysis and retaliative action.  Read its man page or source if you're curious.
  
-  Duplicate removals (by message id).
  
-  Emails which have already been filtered can be used as
       regression tests, to easily spot problems when you make changes
       to your filtering logic.
  -  Limited documentation so far.  This is gradually improving.
  
-  Requires some knowledge of Perl / programming.  (Ironically, if
       it didn't, there would be far greater limitations to the
       filter's flexibility.)
This has been moved into the TODO file in the
release tarball. You'll need <#perl> 5.6.0 or later, and the following Perl
modules, available from <#cpan>:
  - <#pm>Digest::MD5#pm> (for calculating a unique message ID)
  
- <#pm>Mail::Box#pm>
  
- <#pm>Mail::Filter#pm>
  
- <#pm>Mail::Address#pm>
  
- <#pm>Mail::Internet#pm>
  
- <#pm>Mail::Field#pm>
  
- <#pm>Mail::Field::Received#pm> (this one also available locally)
  
- <#pm>Net::DNS#pm> (only if you want to enable RSS cross-checking)
 Here are brief instructions (hey, I did say it was hackerware):
  -  Ensure you have the prerequisites installed.
  
-  Install the <#pm>Parp::*#pm> modules via the standard Perl
       <#keyinput>perl Makefile.PL; make install#keyinput>
       procedure.
  
-  Edit the configuration files <#pm><#samples>#pm> to suite
       your own filtering needs, and save them in
       <#dir>~/.parp#dir>
       (or elsewhere if you know how to instruct Perl to find it).
  
-  Install <#exe>parp#exe> somewhere in your $PATHand check that it happily compiles by running it with the
       <#keyinput>-h#keyinput> option.
-  Invoke <#exe>parp#exe> in test mode (the
       <#keyinput>-t#keyinput> option) as a filter (taking one
       e-mail from STDIN) or on one or more folders (with the
       <#keyinput>-f#keyinput> option).
  
-  If you want to use parp in daemon mode (highly recommended),
       type <#keyinput>make parp-inject#keyinput> to compile the
       executable which inserts mails into the daemon queue.  Then put
       something like <#keyinput>/path/to/parp-inject
       /home/me/mail/spool#keyinput> in your <#file>.forward#file>
       or <#file>.qmail#file>, or do whatever else is necessary to
       get your MDA using parp as a filter.  Finally, start up the
       daemon with the option <#keyinput>-D
       /home/me/mail/spool#keyinput>.
  
-  When you invoke parp, don't forget to enable the nice extras if
       appropriate, like the <#keyinput>-d#keyinput> option for
       discarding duplicates, or the <#keyinput>-r#keyinput> option
       for RSS cross-checking.
  
-  Send yourself some test e-mails to make sure they're getting
       dealt with as you'd wish.
  
-  Enjoy an almost spam-free life immediately. 
  
-  If the filter mistakenly classifies a spam as bona-fide, or
       vice-versa, invoke the filter on that e-mail with the
       <#keyinput>-w#keyinput> option.  (Binding a key press in your
       favourite e-mail reader to pipe a selected e-mail to
       <#keyinput>parp -wv#keyinput> is recommended.)  This writes a
       comment to the logfile that the filter got it wrong, which can
       be used by the accompanying statistics program.  It also
       removes the appropriate addresses from the friends database if
       they happened to be in there already.
  
-  Tweak the filter logic every time you notice the filter
       behaving an a way which doesn't satisfy you.  Bind a key press
       in your mail reader to pipe a selected e-mail to
       <#keyinput>parp -vt#keyinput>.  Test your new configuration
       easily using the above key press.
 There isn't currently much documentation except fairly completel
pod for each of the modules.  This will hopefully change soon.
However, as noted above, you probably shouldn't be using this filter
unless you have some experience as a (Perl) programmer, in which case
you should hopefully find the code very readable.  In particular, read
the documentation for <#pm>Parp::Filter#pm> and
<#pm>Parp::Config#pm> via <#keyinput>perldoc#keyinput>, as these
should give you a fair idea of how to get started customising the
filter to your own tastes.
<#feedback>