spam filtering software

 

Backing up your bayesian database

This is an important item. Over time, you might find your bayesian database becoming less effective. Sometimes, a bayesian database can become corrupt too. If this happens, you have few options but to delete the database files and start training again.

To recognize a poorly performing bayesian database, you again need to refer to headers of sample false-positives, or false-negatives (whichever you're having a problem with).

Many times, your spams are so well crafted that it's next to impossible for a token signature system to discern it from a ham. That's one of the reasons for using a multi layered approach to catching spam - so that some of the other layers will hopefully contribute enough catch these spams.

Often times, the scoring can be so obviously out of whack that it's doing more harm than good. Scoring obvious spams low, and good hams high. This can happen due to neglect or other cause of corruption.

It's important to have a backup of your bayesian database when this happens. It can really save your bacon when your database is no longer performing well. But even if you do have a backup, if not done regularly, that backup might have very old or obsolete tokens that are due to expire. Perhaps it's performing no better than the bad database - not because it's bad too, but because it's tokens are based on spmming campaigns that are no longer being done, in favor of new clients or campaigns.

If that's the case, just delete the database files and start out with a new fresh database. Restart your spam fighting process so SpamAssassin will re-create the empty database files.

Size can also be an indicator of a poorly performing database, or one that has been bloated with too many samples, causing a high signal to noise ratio, and reducing effectiveness of your bayesian database. This should also be fixed by either deleting the database files or restoring a backup. bayesian database size can get quite large. Several megabytes is not uncommon.

To locate the files you're going to backup or delete, search your hard drive for the files bayes_toks or bayes_seen. Most SpamAssassin installations will have these contained in a directory called bayes, and usually at the same directory level as the rules.