HOWTO: SpamAssassin monitoring via SIM

We recently had the need to add SpamAssassin to the list of services SIM monitors. While this is specific to SpamAssassin, it should be readily adaptable to most other system services (exercise left to the reader :wink: ):

  1. First, edit the /usr/local/sim/conf.sim file and add the following variables:

SERV_SPAM="true"                     # SPAMD Service

SPAM_NAME="spamd"                    # name of SpamAssassin service as appears in 'ps'

SPAM_PORT="783"                      # TCP/IP port for SpamAssassin service

SPAM_INIT="/etc/init.d/spamassassin" # path to SpamAssassin service init script

I put each of these variables along with the other variables of the same “type”, just for the sake of organization (e.g., SERV_SPAM went after the other SERV_* variables, SPAM_NAME went after all the other *_NAME variables, and so on)

  1. Open up /usr/local/sim/sim (this is the sim binary itself, so take care in here), and look for the following line:
## Lets load our service checks

Beneath that line, add the following code block:

if [ "$SERV_SPAM" = "true" ]; then
        TIC=$[TIC+1]
        TSERVC=$[TSERVC+1]
        EVCOUNT_SPAM="0"
        . $INSPATH/$CHKSERV/spam.chk
        chk_spam
else
        EVCOUNT_SPAM="0"
        SPAM="module disabled"
fi
  1. Edit /usr/local/sim/internals/status.sim and look for:
if [ ! "$TSERVC" == "0" ]; then

Beneath that line, add:

  if [ ! "$SPAM" == "module disabled" ]; then
   echo "SPAM     [$SPAM - $EVCOUNT_SPAM events]"
  fi
  1. Create the file /usr/local/sim/internals/dat/spam.dat and put the following inside:
EVC:0

This is the event counter, and will be incremented each time spamassassin is restarted.

  1. Create the file /usr/local/sim/internals/chk/serv/spam.chk and put the following code inside:
chk_spam() {
SRV="spam"

# Creat/load our $SRV.dat file
srv_dat $SRV

if [ "$LAXCHK" == "true" ]; then
        NET_SPAM=`cat $NS_CACHE | grep -w $SPAM_PORT` >> /dev/null 2>&1
        PS_SPAM=`cat $PS_CACHE | grep -w $SPAM_NAME` >> /dev/null 2>&1
else
        NET_SPAM=`cat $NS_CACHE | grep -w $SPAM_PORT | grep -w $SPAM_NAME` >> /dev/null 2>&1
        PS_SPAM=`cat $PS_CACHE | grep -w $SPAM_NAME` >> /dev/null 2>&1
fi
EVCOUNT_SPAM=`cat $INSPATH/$DAT/$SRV.dat | tr ':' ' ' | grep EVC | awk '{print$2}'` >> /dev/null 2>&1

if [ "$NET_SPAM" == "" ]; then
        echo "$PREFIX SPAM service is offline." >> $SIMLOG
        spam_restart
else
        if [ "$PS_SPAM" == "" ]; then
                echo "$PREFIX SPAM service is offline." >> $SIMLOG
                spam_restart
        else
                echo "$PREFIX SPAM service is online." >>  $SIMLOG
                SPAM="online"
        fi
fi
}

spam_restart() {
if [ "$EVCOUNT_SPAM" -gt "$DISRST" ]; then
      echo "$PREFIX SPAM offline, restart limit exceeded." >> $SIMLOG
      SPAM="offline"
else
 if [ ! -f "$SPAM_INIT" ]; then
         echo "$PREFIX SPAM restart failed, could not find $SPAM_INIT." >> $SIMLOG
         SPAM="restart failed"
         ALERT="true"
         TIE=$[TIE+1]
         EVCOUNT_SPAM=$[EVCOUNT_SPAM+1]
 else
  if [ ! "$RST" = "false" ]; then
           ALERT="true"
           SPAM="restarted"
           TIE=$[TIE+1]
           EVCOUNT_SPAM=$[EVCOUNT_SPAM+1]

           rm -f $INSPATH/$DAT/$SRV.dat
           touch $INSPATH/$DAT/$SRV.dat
           chmod 640 $INSPATH/$DAT/$SRV.dat
           chown root:root $INSPATH/$DAT/$SRV.dat
           echo "EVC:$EVCOUNT_SPAM" >> $INSPATH/$DAT/$SRV.dat
           cd / ; $SPAM_INIT $INIT_ARG >> /dev/null 2>&1
           echo "$PREFIX Restarted SPAM service ($EVCOUNT_SPAM SPAM events today)." >> $SIMLOG
  else
           SPAM="down, restart disabled"
           ALERT="true"
           TIE=$[TIE+1]
           EVCOUNT_SPAM=$[EVCOUNT_SPAM+1]
           echo "$PREFIX SPAM down, restart disabled via conf.sim." >> $SIMLOG
  fi
 fi
fi
}

That should be it. No need to restart anything, since SIM runs every five minutes and re-reads the config file each time it runs. Tail the sim log file (tail -f /usr/local/sim/sim.log) and within five minutes, you should see:

[11/23/06 13:50:01]: SPAM service is online.

If you do not see this, something’s not right, so retrace your steps.

Then, shutdown SpamAssassin (service spamassassin stop), and watch the sim log file again. Within another 5 minutes, you should see:

[11/23/06 13:55:01]: SPAM service is offline.
[11/23/06 13:55:01]: Restarted SPAM service (1 SPAM events today).

That’s it! :slight_smile:

Excellent Tutorial Socheat!

Will be deploying this on our boxes tonight so will let you know how we get on :wink:

Great! Let me know if I missed anything.

[11/23/06 23:45:01]: SPAM service is online.
[11/23/06 23:45:01]: LOAD 0.05 (status good)
[11/23/06 23:45:01]: NETWORK is online.
[11/23/06 23:45:01]: FTP service is online.
[11/23/06 23:45:01]: HTTP service is online.
[11/23/06 23:45:01]: MYSQL service is online.
[11/23/06 23:50:01]: SPAM service is offline.
[11/23/06 23:50:01]: Restarted SPAM service (1 SPAM events today).
[11/23/06 23:50:01]: LOAD 0.13 (status good)
[11/23/06 23:50:01]: NETWORK is online.
[11/23/06 23:50:01]: FTP service is online.
[11/23/06 23:50:01]: HTTP service is online.
[11/23/06 23:50:01]: MYSQL service is online.
Worked an absolute treat Socheat.

Just one thing though, in step 1 you refer to /usr/local/conf.sim which should actually be: /usr/local/sim/conf.sim

Apart from that, all good :smiley:

Good to hear, and fixed :slight_smile:

Great work… However, these are my results:

[root@server1 serv]# tail -f /usr/local/sim/sim.log
[11/24/06 09:00:02]: LOAD 0.42 (status good)
[11/24/06 09:00:02]: NETWORK is online.
[11/24/06 09:00:02]: SPAM service is offline.
[11/24/06 09:00:02]: Restarted SPAM service (1
SPAM events today).
[11/24/06 09:00:02]: FTP service is online.
[11/24/06 09:00:02]: HTTP service is online.
[11/24/06 09:00:02]: SSH service is online.
[11/24/06 09:00:02]: MYSQL service is online.
[11/24/06 09:00:02]: XINET service is online.
[11/24/06 09:05:01]: LOAD 0.35 (status good)
[11/24/06 09:05:01]: NETWORK is online.
[11/24/06 09:05:01]: SPAM service is offline.

[root@server1 serv]# tail -f /usr/local/sim/sim.log
[11/24/06 09:05:01]: LOAD 0.35 (status good)
[11/24/06 09:05:01]: NETWORK is online.
[11/24/06 09:05:01]: SPAM service is offline.
[11/24/06 09:05:01]: Restarted SPAM service (1
SPAM events today).
[11/24/06 09:05:01]: FTP service is online.
[11/24/06 09:05:01]: HTTP service is online.
[11/24/06 09:05:01]: SSH service is online.
[11/24/06 09:05:01]: MYSQL service is online.
[11/24/06 09:05:01]: XINET service is online.

Within two five minute periods spamassassin was offline. That doesn’t seem right, does it?

More Info:

[root@server1 sim]# sim -v
Warning: bad syntax, perhaps a bogus ‘-’? See /usr/share/doc/procps-3.2.3/FAQ
/usr/local/sim/internals/chk/serv/spam.chk: line 8: /usr/local/sim/tmp/.sim.nscache: Permission denied
/usr/local/sim/internals/chk/serv/spam.chk: line 9: /usr/local/sim/tmp/.sim.pscache: Permission denied
/usr/local/sim/internals/chk/serv/spam.chk: line 15: /usr/local/sim/internals/dat/spam.dat: Permission denied

/usr/local/sim/internals/chk/serv/spam.chk: line 59: EVC:1: command not found
[11/25/06 16:43:05]: LOAD 1.29 (status good)
[11/25/06 16:43:05]: NETWORK is online.
[11/25/06 16:43:05]: SPAM service is offline.
[11/25/06 16:43:05]: Restarted SPAM service (1
SPAM events today).
[11/25/06 16:43:05]: FTP service is online.
[11/25/06 16:43:05]: HTTP service is online.
[11/25/06 16:43:05]: SSH service is online.
[11/25/06 16:43:05]: MYSQL service is online.
[11/25/06 16:43:05]: XINET service is online.

bump… bump…

Check the permissions on the following directories:

/usr/local/sim/tmp/ (drwx------ 2 root root )
/usr/local/internals/dat (drw-r----- 2 root root)
/usr/local/internals/dat/spam.dat (-rw-r----- 1 root root)

Also, what is line 59 in your spam.chk file?

I had to modify the permissions on the /usr/local/sim/tmp/ directory (chmod 700). Everything else looked like how you posted it.

Line 59 is:
“EVC:$EVCOUNT_SPAM” >> $INSPATH/$DAT/$SRV.dat

[root@server1 dat]# sim -v
Warning: bad syntax, perhaps a bogus ‘-’? See /usr/share/doc/procps-3.2.3/FAQ
/usr/local/sim/internals/chk/serv/spam.chk: line 8: /usr/local/sim/tmp/.sim.nscache: Permission denied
/usr/local/sim/internals/chk/serv/spam.chk: line 9: /usr/local/sim/tmp/.sim.pscache: Permission denied
/usr/local/sim/internals/chk/serv/spam.chk: line 15: /usr/local/sim/internals/dat/spam.dat: Permission denied

/usr/local/sim/internals/chk/serv/spam.chk: line 59: EVC:1: command not found
[12/02/06 08:57:02]: LOAD 0.78 (status good)
[12/02/06 08:57:02]: NETWORK is online.
[12/02/06 08:57:02]: SPAM service is offline.
[12/02/06 08:57:02]: Restarted SPAM service (1
SPAM events today).
[12/02/06 08:57:02]: FTP service is online.
[12/02/06 08:57:02]: HTTP service is online.
[12/02/06 08:57:02]: SSH service is online.
[12/02/06 08:57:02]: MYSQL service is online.
[12/02/06 08:57:02]: XINET service is online.

Did you copy and paste the spam.chk that I provided? Because that’s not line 59 in my spam.chk file. Line 59 should be:

EVCOUNT_SPAM=$[EVCOUNT_SPAM+1]

The line you gave is line 52, and you’re missing an “echo”:

echo "EVC:$EVCOUNT_SPAM" >> $INSPATH/$DAT/$SRV.dat

I would try copying and pasting what I posted again, and check that your editor didn’t put in line breaks (i.e., auto-wrapped some of the longer lines).

I did copy and paste it but something got lost in the translation. That’s fixed now and now I’m getting this:

SIM 2.5-3 <sim@r-fx.org>

subsys locked, already running SIM ? – aborting.

Ok… Disregard the previous subsys locked error. Now here’s what I’m getting:

[root@server1 sim]# sim -v
Warning: bad syntax, perhaps a bogus ‘-’? See /usr/share/doc/procps-3.2.3/FAQ
'usr/local/sim/internals/chk/serv/spam.chk: line 1: syntax error near unexpected token { 'usr/local/sim/internals/chk/serv/spam.chk: line 1: chk_spam() {
/usr/local/sbin/sim: line 205: chk_spam: command not found
[12/02/06 09:49:18]: SSH service is online.
[12/02/06 09:49:18]: MYSQL service is online.
[12/02/06 09:49:18]: XINET service is online.
[12/02/06 09:49:27]: LOAD 0.51 (status good)
[12/02/06 09:49:27]: NETWORK is online.
[12/02/06 09:49:27]: FTP service is online.
[12/02/06 09:49:27]: HTTP service is online.
[12/02/06 09:49:27]: SSH service is online.
[12/02/06 09:49:27]: MYSQL service is online.
[12/02/06 09:49:27]: XINET service is online.

Double check that first line again. It’s not happy about something in that first line. Make sure there aren’t any stray characters and what not at the beginning of the file.

Hmm…

[root@server1 serv]# cat spam.chk
chk_spam() {
SRV=“spam”

Creat/load our $SRV.dat file

srv_dat $SRV

if [ “$LAXCHK” == “true” ]; then
NET_SPAM=cat $NS_CACHE | grep -w $SPAM_PORT >> /dev/null 2>&1
PS_SPAM=cat $PS_CACHE | grep -w $SPAM_NAME >> /dev/null 2>&1
else
NET_SPAM=cat $NS_CACHE | grep -w $SPAM_PORT | grep -w $SPAM_NAME >> /dev/null 2>&1
PS_SPAM=cat $PS_CACHE | grep -w $SPAM_NAME >> /dev/null 2>&1
fi
EVCOUNT_SPAM=cat $INSPATH/$DAT/$SRV.dat | tr ':' ' ' | grep EVC | awk '{print$2}' >> /dev/null 2>&1

if [ “$NET_SPAM” == “” ]; then
echo “$PREFIX SPAM service is offline.” >> $SIMLOG
spam_restart
else
if [ “$PS_SPAM” == “” ]; then
echo “$PREFIX SPAM service is offline.” >> $SIMLOG
spam_restart
else
echo “$PREFIX SPAM service is online.” >> $SIMLOG
SPAM=“online”
fi
fi
}

spam_restart() {
if [ “$EVCOUNT_SPAM” -gt “$DISRST” ]; then
echo “$PREFIX SPAM offline, restart limit exceeded.” >> $SIMLOG
SPAM=“offline”
else
if [ ! -f “$SPAM_INIT” ]; then
echo “$PREFIX SPAM restart failed, could not find $SPAM_INIT.” >> $SIMLOG
SPAM=“restart failed”
ALERT=“true”
TIE=$[TIE+1]
EVCOUNT_SPAM=$[EVCOUNT_SPAM+1]
else
if [ ! “$RST” = “false” ]; then
ALERT=“true”
SPAM=“restarted”
TIE=$[TIE+1]
EVCOUNT_SPAM=$[EVCOUNT_SPAM+1]

       rm -f $INSPATH/$DAT/$SRV.dat
       touch $INSPATH/$DAT/$SRV.dat
       chmod 640 $INSPATH/$DAT/$SRV.dat
       chown root:root $INSPATH/$DAT/$SRV.dat
       echo "EVC:$EVCOUNT_SPAM" &gt;&gt; $INSPATH/$DAT/$SRV.dat
       cd / ; $SPAM_INIT $INIT_ARG &gt;&gt; /dev/null 2&gt;&1
       echo "$PREFIX Restarted SPAM service ($EVCOUNT_SPAM SPAM events today)." &gt;&gt; $SIMLOG

else
SPAM=“down, restart disabled”
ALERT=“true”
TIE=$[TIE+1]
EVCOUNT_SPAM=$[EVCOUNT_SPAM+1]
echo “$PREFIX SPAM down, restart disabled via conf.sim.” >> $SIMLOG
fi
fi
fi
}

I’m not sure what’s up oaf, that looks to be correct. I copy and pasted your spam.chk into mine, and I don’t get any errors. You could try starting over by forcing the re-install of the SIM rpm, and then retracing your steps.

rpm -Uvh --force http://updates.interworx.com/iworx/RPMS/noarch/sim-2.5-6.iworx.noarch.rpm

[root@server1 ~]# rpm -Uvh --force http://updates.interworx.com/iworx/RPMS/noarch/sim-2.5-6.iworx.noarch.rpm
Retrieving http://updates.interworx.com/iworx/RPMS/noarch/sim-2.5-6.iworx.noarch.rpm
error: cannot open Packages index using db3 - Permission denied (13)
error: cannot open Packages database in /var/lib/rpm

Sounds like you have more issues than just SIM issues. :slight_smile: Is one (or more) of your partitions full?

No… I’ve got tons of space.

Disabled les (http://www.rfxnetworks.com/les.php) and now I’m rolling.