parp TODO list
==============

In approximate order of decreasing priority ...

* Daemon mode

What format should the incoming queue be?  I'm tempted to think mbox
with one file per message is simplest in terms of locking and
efficiency.  This would mean:

** parp-inject (already written)

A new tiny (read "very fast") executable (c.f. qmail-inject) which
goes in .forward and does nothing more than transfer its STDIN into
$in_queue/$unique_id.  It co-operates with parp via flock(2) of
course.

** parpd-check

(maybe) kind of simple script which could be run via cron to check
that the daemon is still alive (c.f. eggdrop's botchk), and if not,
restart it.  This would be necessary considering that parp is
currently run on a per-user basis, and sysadmins sometimes reboot
machines ...

* Rework filtering mechanism

to be much more general and flexible.  Key points are:

** use exceptions

Use exceptions to distinguish between terminating and non-terminating
`recipes'.

** rework current accept/reject methods

*** new low-level filtering methods

Neither should be called from user code.

**** $m->_deliver(@dest_folders)

Marks the mail for delivery to @dest_folders.

**** $m->_terminate_filter(@dest_folders)

Throws a Parp::TerminateFilter exception.

*** new higher-level filtering methods

**** $m->accept($dest_folder, $reason, @details)

Instantiates a Parp::Reason from $reason and @details.
Registers acceptance in the log file and mail header X-Parp-Accepted.
Delivers to $dest_folder.

**** $m->final_accept($dest_folder, $reason, @details)

sub final_accept {
  my $self = shift;
  $m->accept(@_);
  $m->_terminate_filtering();
}

**** $m->reject_junk_mail($reason)

sub reject_junk_mail {
  my $self = shift;
  $m->accept(@_);
  $m->_terminate_filtering();
}

**** $m->categorize(@new_categories)

Adds mail to given categories.

*** delivery happens *after* filtering has finished

* Replace $m->{from} etc. with methods

$m->{from} becomes $m->from etc., which is better OO, easier to type,
and makes spotting misspellings more likely.

Class::MakeMethods?

* Friends database

** Fix the bugger!

Why isn't the damn friends db isn't working any more?

** Automatic addition

Add addresses from all mails which don't get flagged as spam to the
friends database.  Also automatic tweaking via -w wrong_class option.

* Simplify install procedure

** Automate conversion from procmail

Pinch Simon Cozen's script for this :-)

* Sanity checking on Received: parse

** MX records and relaying

One idea for improving parp's delivery analysis would be to introduce
a list of trusted relays into the user's MyFilter.pm configuration,
and then look out for untrusted relays by checking the MX records for
the domains contained in the `for <address>', To:, Cc:, and
Apparently-To: headers.  You could then maybe even automate checks for
open Relays.

* Blacklist lookups

** Replace Parp::Blacklist with CPAN module?

There are several already out there, after all.

** Don't do blacklist lookups on known good hosts

Another improvement would be to avoid doing blacklist DNS lookups on
the hosts involved in the later stages of the delivery.  For example,
any mail sent to <adam@spiers.net> ends up with the first 4 Received:
headers always being the same delivery path which I know is good.
Avoiding DNS lookups on my own mail handlers every time I get an email
is obviously something I should get round to :-)

It would also be great to be able to automatically detect faked
Received: headers, but I can't think of a way how.

* Tests

** Regression tests

Can use any previously filtered mail as a regression test by comparing
all its /^X-Parp-/ headers with headers generated by a test run of the
filter on it.

** Unit tests

For each module.

*** Parp::Locking

Extend flock_test.sh to test this.

* Documentation!

Yeah it's hackerware, but even hackers should have docs.

* Auto-responders

Possible uses of auto-responders:

** automatically complaining about spam to the relevant authorities

** informing people of the password system

** `vacation' emulator

* Support for filtering inside nested multiparts

I don't think it works yet.  Only affects content-based tests, not
header-based ones, of course.

* Loop protection for all replied/forwarded mail.

I can't remember what I meant by that.

* More probabilistic/neural network based approach

... rather than the current Boolean one.

I'm still undecided about this one.

My current thinking is that it is best to keep the current
black-and-white boolean logic of the existing spam-or-not decision
tree, but replacing the current ugly max_quite_bad_words and
max_unique_quite_bad_words hack with ifile's algorithm in
has_spam_content() would probably work an absolute treat.

  http://www.ai.mit.edu/~jrennie/ifile/

ifile depends on mh.  Yuk.

Just spotted this:

http://spamassassin.taint.org/

Looks quite nice, supposed to be very accurate too.  Might have
to try it out, pinch some of the ideas ;-)

**** Local variables:
**** mode:outline
**** End:


