WTCS.ORG

Threshold Monitoring


MRTG How-to                 SNMPMON How-to


Overview:

Well, you now have the ability to monitor a particular system performance statistic, and that's just great!  But... what if that value exceeds (or drops below) a threshold?  For instance, what if the "free disk space OID" value on that NT server drops below 100 (as in MB).  Wouldn't you like to be notified so that you could proactively deal with the situation, instead of finding out the hard way?     Basically the focus of this page is to tell you how you can GET an SNMP counter, compare it to a threshold level, and take action if the SNMP counter is above or below the threshold level.  Actions could include some or many of the following (lets assume that disk space drops below the set threshold of 100MB):

As you can see, there a number of things that can be done, and there are a number of ways you can do this.  In this section, I will deal with two.  Both free.  One (called SNMPMON.EXE) comes with the Windows NT Resource kit, and the  other is good old MRTG.


MRTG

Since the release of version 2.7.4, it has been possible to spawn a process (i.e. run a program) if a value returned by the MRTG SNMP GET (or helper script) is higher or lower than a preset threshold value.   I will show you how to call a batch file that will start a PERL script that will send SMTP (Internet) email.  The image below shows an actual message sent by MRTG/BLAT.EXE using the files you can download below:

Something you should know about:

When MRTG detects a threshold breach, and provided you have the config file set up to call a program (with the ThreshMinI/ThreshProgI/ThreshMaxO/ThreshProgO options) it will run a program to send an email message.  This will slow down your polling process.  You should have a beefy system if you want to enable this!

I have modified SMTPMAIL.PL so that it will check to see if it sent a message within the last hour, and if it has, it will not do so again.  This gets around the problem of receiving an email every 5 minutes when the MRTG scan is performed.  You can see in the screenshot below how it sends every hour.  However, this is not 100% accurate, since a threshold breach might occur at 55 minutes past the hour, and an email would be sent.  Then another breach might occur at 5 minutes past the next hour, and since it is in another hour (24 hour clock), then another would be sent.  Still it's LOST better than every 5 minutes! 

In the screenshot below, you can see than there was a CPU breach, but it was the second in the hour, so nothing was done, and then there was a disk space breach, and it was the first in the hour, so threshold alerts were sent!

 

Prerequisites:

1) Be sure you have at least v2.7.4 of MRTG.  Get the latest MRTG from here.

2) Download the following files as you will need them to get this going ...

MRTG Disk Config file thresholds.zip sendmail.zip

provides a sample of the threshold parameters in a disk monitoring config file

put the contents of this zip file in the mrtg/thresholds directory put the contents of this zip file in the mrtg/sendmail directory

3) make a subdirectory under MRTG called SENDMAIL, and unzip the contents of sendmail.zip into it.

4) make a subdirectory under MRTG called THRESHOLDS, and unzip the contents of threshold.zip into it.

Note:  You can put these files in any directory, but you will need to edit them to reflect your directory structure.

 

Theory of Operation:

A MRTG config file must be properly configured to spawn a process on a threshold breach.  In this case, the process is either threshover.bat or threshunder.bat, depending on whether the value returned by MRTG is lower than ThreshMinI, or higher than ThreshMaxO (assuming that the two OIDs queried by MRTG are the same).   I added following parameters to the base disk monitoring config file  (you can see these in win2kfs1_disk.cfg) ... 

Note: I do NOT add the ThreshDir: config file option.  I may add it at a later date as I modify smtplmail.pl.  It serves no useful purpose at this point.

# Use ThreshMinI for minimum threshold breach notification
# i.e. if under ThreshMinI, then threshunder.bat will run
ThreshMinI[SERVER_DSK_D]:
5000
ThreshProgI[SERVER_DSK_D]:
f:\systap\thresholds\threshunder.bat


# Use ThreshMaxO for maximum threshold breach notification
# i.e. if over ThreshMaxO, then threshover.bat will run
ThreshMaxO[SERVER_DSK_D]:
6000
ThreshProgO[SERVER_DSK_D]:
f:\systap\thresholds\threshover.bat

Change only the sections in green highlight.

In our world, threshunder.bat and threshover.bat do two things:

First, they call set_vars.bat, which translates the returned MRTG value and the ThreshMinI and ThreshMaxO values into more friendly names.  Since there is no "friendly name" paramter in the MRTG config reportoire, we translate the $router (or target name) into something more easily interpreted, and set it into an environment variable.  For example, without set_vars.bat, MRTG's threshold subroutine returns server_dsk_d.cfg's $router name as server_dsk_d (not much room to describe what you're monitoring).  By using set_vars.bat, we can translate it into an environment variable called DESC, which we set to Remaining Disk (D:) Space (MB) on System: WIN2KFS1.  Much better, huh?  Then, we translate the two other values returned by the MRTG.PL threshold subroutine into environment variables called BREACH_VALUE and ACTUAL_VALUE.  Then, set_vars.bat returns control back to either threshover.bat or threshunder.bat, depending on which called it in the first place.

You will need to edit set_vars.bat (and following the examples) create alias to the mrtg target name(s) from your config files. For example, in the file win2kfs1_disk.cfg, notice that the section for the disk d: target is like so:

Target[server_dsk_d]: blah blah blah.

So, in set_vars.bat, we do the following...

if "%1"=="server_dsk_d" set desc=Remaining Disk (D:) Space (MB) on System: WIN2KFS1
(converts target name server_dsk_d to friendly name)

Now, the description is much more meaningful.  Note that you must edit set_vars.bat, and insert the target name of your mrtg config files (where you want threshold monitoring) and the new "friendly name". Change only the sections in green highlight.  You would do this for every target in every config file you have enabled thresholds on.  The set_vars.bat file has several examples.

Note: no two targets can have the same name at this point.  I will address this shortly.  I suggest including the system name in your target name (my examples do not have this!)

Secondly, threshunder.bat and thresover.bat calls a PERL script I wrote called smtpmail.pl, which creates and writes to a threshold breach log file (in the thresholds directory, called mrtgthresh.log).  Then, it generates the subject line and body of an SMTP message (in the thresholds directory, called mailmsg.txt), and sends it to a predefined user.

In order to support hourly messages (not every 5 minutes!) and multiple devices, I also had to modify SMTPMAIL.PL so that it "knew" about which devices it sent an alert on.  In the thresholds directory, it creates files for each device (target[]) it watches.  See the image below.

EDIT SMTPMAIL.PL!  You must edit it, and make changes in the section where the email sender and recipients are defined.  You will see in the smtpmail.pl file you downloaded, that a system called garthkw@pacbell.net sends to garth.williams@wtcs.org.  You can point to the message file too!  Here is a snip of that section.  Change only the sections in green highlight.

$recipients = "garth.williams\@wtcs.org";
$ccaddress = "";
$fromsender = "
garthkw\@yourcompany.com";
$returnto = "
SYSTAP\@wtcs.org";
$subject = "$message";
$port = "25";
$body = "
f:\\systap\\thresholds\\mailmsg.txt ";
$blatpath = "
f:\\systap\\sendmail\\blat.exe ";
# $server = "smtpserver@yourcompany.com";

Be sure you set  the mail server and sender using the blat -install option (see the sendmail/install.bat).  $recipients can be a list (comma separated).  $ccaddress takes the same format as $recipients.  $fromsender MUST be known by the SMTP server!  Obviously, the paths must be correct!  Only change $port if your SMTP server is running on another (non-standard) port.

While not extremely difficult, this section is hard to document, so you may have a little difficulty, and need help.  Feel free to email me if you need some help here!  I will do my best to help!  Good Luck!


SNMPMON

SNMPMON.EXE is a utility provided by Microsoft in their Windows NT Resource Kit, and is included in SNMP4NT.ZIP.   Here is Microsoft's description of SNMPMON.EXE:

SNMP Monitor is a utility that can monitor any SNMP MIB variables across any number of SNMP nodes. It can then optionally log query results to any ODBC data source (such as SQL Server), automatically creating any necessary tables. Logging can be enabled for all queries or limited to particular thresholds, and thresholds can be either edge or level triggered.

Rudimentary conditionals are also possible. SNMP Monitor can execute arbitrary command lines based on whether or not the node responded to the query, whether or not the node supported the requested variable, and whether the value was greater than, less than, or equal to a specified constant.

SNMP Monitor is a stand-alone executable that accepts a configuration file as input.

Sounds cool huh?  Well it is, but there are a few of drawbacks (as per Microsoft), and my comments:

You will find a subdirectory called SNMPMON in SNMP4NT.ZIP.  It contains what I have done so far.  I have included RaiHan Kibria's Hex Editor, as well as a sample SNMPMON config file.  It only has the basics now (Disk C monitoring, start by typing "snmpmon dsk_c.mon").  I will update it in the coming weeks!  I will be writing a small Access Database, and making it update that as well!  Stay tuned!


Other Methods of Notification:

snmpmon.exe detects a threshold breach, calls nttrapgen.exe, which sends a trap to traprcvr.exe, which in turn spawns an email or runs another program.

mailpage monitors MAPI mailbox, upon receipt of a network 911 email pages technician with subject line

snmpmon.exe detects a threshold breach, spawning syslog client, which sends a syslog message to syslog daemon server running on an NT server.  Event log post and Access database update.


Other tasks that could be spawned:

DISK PURGE


To return the the main page, click the Go Home! logo!