Pbm with Spamassassin and Bayes

Hello,

By default spamassasin is configured with the use_bayes and bayes_autolearn options.

use_bayes ( 0 | 1 ) (default: 1)

Whether to use the naive-Bayesian-style classifier built into SpamAssassin. This is a master on/off switch for all Bayes-related operations.

use_bayes_rules ( 0 | 1 ) (default: 1)

Whether to use rules using the naive-Bayesian-style classifier built into SpamAssassin. This allows you to disable the rules while leaving auto and manual learning enabled.

bayes_auto_learn ( 0 | 1 ) (default: 1)

Whether SpamAssassin should automatically feed high-scoring mails (or low-scoring mails, for non-spam) into its learning systems. The only learning system supported currently is a naive-Bayesian-style classifier.
Note that certain tests are ignored when determining whether a message should be trained upon:

  • rules with tflags set to ‘learn’ (the Bayesian rules) - rules with tflags set to ‘userconf’ (user white/black-listing rules, etc) - rules with tflags set to 'noautolearn’Also note that auto-training occurs using scores from either scoreset 0 or 1, depending on what scoreset is used during message check. It is likely that the message check and auto-train scores will be different.

In this case, bayes has a database. The bayes database path is, by default “~/.spamassassin/bayes” and should have several databases like bayes_toks, bayes_seen, etc…

bayes_path /path/to/file (default: ~/.spamassassin/bayes)

Path for Bayesian probabilities databases. Several databases will be created, with this as the base, with _toks, _seen etc. appended to this filename; so the default setting results in files called ~/.spamassassin/bayes_seen, ~/.spamassassin/bayes_toks etc.
By default, each user has their own, in their ~/.spamassassin directory with mode 0700/0600, but for system-wide SpamAssassin use, you may want to reduce disk space usage by sharing this across all users. (However it should be noted that Bayesian filtering appears to be more effective with an individual database per user.)

It appears that on my box there is no Bayes database. I’ve tried to do a locate bayes and the only one existing is the ruleset config file in /usr/share/spamassassin

When I used spamassassin I had created a file /etc/sysconfig/spamassassin to start spamassassin with the option

SPAMDOPTIONS=“-x -u spamd -H /home/spamd -d”

in /home/spamd there was the bayes databases

the user pref file should be in $HOME/.spamassassin/user_prefs but in our case they are stored in a MySQL database. Maybe the bayes databases should also be stored in a MySQL database ? anyway…

How did you setup spamassassin ?
Where is the bayes databases ?
Is bayes set up per user level or box level ?
How to create these databases ?

Thanks
Pascal

I was trying to search the forums to see how the bayes works. Does the bayes training effect the server as a whole, per domain, or per user?

Found this thread
http://interworx.com/forums/showthread.php?t=451

It’s per user by default. In the NodeWorx Spam config there is an option to make it effect the server as a whole (use 1 global bayes database)

lol

the ticket is 03-05-2005, 10:44 AM old and I’d answer it, when I saw it was me :wink:

If you change this to global and then back does everything get lost?

Was just looking at my /etc/mail/spamassassin/local.cf and saw this:

#   Use Bayesian classifier (default: 1)
#
# use_bayes 1


#   Bayesian classifier auto-learning (default: 1)
#
# bayes_auto_learn 1

Is this set somewhere else? Or should these be uncommented?

It is set in the database iworx_spam under @global

You should have these entries in the local.cf file :

user_scores_dsn DBI:mysql:iworx_spam;mysql_socket=/PATH/TO/SOCKET/mysql.sock
user_scores_sql_username iworx
user_scores_sql_password fjt8wn1lkjxx
user_scores_sql_custom_query SELECT preference, value FROM TABLE WHERE username = USERNAME OR username = ‘@GLOBAL’ OR username = CONCAT(‘@~’,DOMAIN) ORDER BY username ASC

auto_whitelist_factory Mail::SpamAssassin::SQLBasedAddrList
user_awl_dsn DBI:mysql:iworx_spam;mysql_socket=/PATH/TO/SOCKET/mysql.sock
user_awl_sql_username iworx
user_awl_sql_password xxxxxxx
user_awl_sql_table awl

bayes_store_module Mail::SpamAssassin::BayesStore::SQL
bayes_sql_dsn DBI:mysql:iworx_spam;mysql_socket=/PATH/TO/SOCKET/mysql.sock
bayes_sql_username iworx
bayes_sql_password xxxxxxx

Pascal

[quote=pascal;14368]It is set in the database iworx_spam under @global
You should have these entries in the local.cf file [/quote]

I assume scoring and rules in local.cf or user prefs overrides what’s in the database. Do my entries go above or below the iWorx SA lines in local.cf?