Is anyone able to tell me what the value for server load is based on. I keep getting these notifications when the server is responding fine, but don’t really want to increase the alert level until I know exactly what is being measured and how.
We use the system load numbers. When you get the system load, it returns the 1 minute, 5 minute, and 15 minute load average. Read a bit about the “Load Average” here - http://www.linuxjournal.com/article/9001 - it’s not exactly CPU percentage.
By default, the health monitor checks the 5 minute number (defined in the “period” in the configuration) to see if it exceeds the threshold. If it does, you get an email.
Load Average is a very weird indicator. From what I read it can be kind of pointless, but a good rule of thumb is it shouldn’t be higher than 1 x # of CPUs. So if you have a dual core server the load average shouldn’t get much higher than 2.
An “ok” load average does vary from server to server, and indeed, particuarly based on the number of cpu cores. It’s definitely not pointless, in that I’ve never seen a healthy server with load > 75. But, I have seen busy, but still healthy quad-core servers with load average between 10 and 20. Much higher than that though, and you tend to see serious performance degredation.