System Health - Server Load

hacman · August 28, 2010, 1:51am

Hi All,

Is anyone able to tell me what the value for server load is based on. I keep getting these notifications when the server is responding fine, but don’t really want to increase the alert level until I know exactly what is being measured and how.

Any help greatly appreciated,

Thanks,

Jon

IWorx-Tim-Pgh · August 30, 2010, 6:11am

We use the system load numbers. When you get the system load, it returns the 1 minute, 5 minute, and 15 minute load average. Read a bit about the “Load Average” here - http://www.linuxjournal.com/article/9001 - it’s not exactly CPU percentage.

By default, the health monitor checks the 5 minute number (defined in the “period” in the configuration) to see if it exceeds the threshold. If it does, you get an email.

That help?

Tim

Justec · August 30, 2010, 6:16pm

Load Average is a very weird indicator. From what I read it can be kind of pointless, but a good rule of thumb is it shouldn’t be higher than 1 x # of CPUs. So if you have a dual core server the load average shouldn’t get much higher than 2.

IWorx-Paul · August 30, 2010, 10:31pm

An “ok” load average does vary from server to server, and indeed, particuarly based on the number of cpu cores. It’s definitely not pointless, in that I’ve never seen a healthy server with load > 75. But, I have seen busy, but still healthy quad-core servers with load average between 10 and 20. Much higher than that though, and you tend to see serious performance degredation.

Paul

hacman · August 31, 2010, 6:56am

Thanks for the clarification on that one.

All makes sense now

Jon