Bizarre entires in /var/log/httpd/error_log

I’m getting a lot of these lately in my error log. They normally coincide with a heavy load spike and in some cases a quick auto-httpd reload courtesty of monit:

[Mon Dec 05 22:35:26 2005] [crit] (20014)Error string not specified yet: shGetLockedEntry(80af480, “SERVER”) failed in watchCleanUpHash()

Now as far as I know those aren’t apache errors, rather java errors. But why are they suddenly appearing there, and what’s “watchCleanUpHash”? I don’t use Java myself to any huge level. Is it a Nodeworx/Interworx command somewhere? the “SERVER” tag would seem to implicate that the error is not assigned to a specific domain.

References I’ve manged to find so far:
http://community.digi-net.com/printthread.php?t=226

Edit: Apparently 20014 is a generic code for an unknown Apache error. Is it possible that the Apache verison on the yum lists has a bug in it? These errors seem to have appeared out of the blue. “watchCleanUpHash” doesn’t tally up with any code that I recognise of mine on the sites. A quick httpd -v returns: Server version: Apache/2.0.54 Server built: Apr 25 2005 20:44:09

They are actually not java errors Ivery, they are from mod_watch (http://www.snert.com/Software/mod_watch/) which is the apache module we use for bandwidth accounting. While they are listed there as a “critical” error the worst that could happen iworx-cp wise is that the webserver graph on the system graphs page would be off.

I’m not sure of a “fix” and my quick google search didn’t turn up much either. Are there hundreds of these or simply a few here and there during heavy spikes?

Chris

Thanks Chris, they are appearing during heavy load. What seems to be happening is that at extremely busy times httpd becomes unresponsive, those errors appear all over the logs and then monit kicks in to reboot httpd because of the connectivity fallout (it checks the outside connection every 2 minutes)

We’re currently using around 200 of the child processes (prefork config) at peak times at the moment, so I’ve just bumped the max number to 256.

Nothing has changed config wise for many months now and these errors started about a month back. I think that the errors are due to a software package update (presumeably a yum auto) or an effect of the recent loading spikes. …it’s hard for me to say without dumping the httpd into debug mode which is hard to do on a server in use at the moment.

Just checked the error logs, it looks like I get that error for each child process e.g. (not all listed):

[Tue Dec 06 16:30:04 2005] [crit] (20014)Error string not specified yet: shGetLockedEntry(819bbf0, “SERVER”) failed in watchCleanUpHash()
[Tue Dec 06 16:30:04 2005] [crit] (20014)Error string not specified yet: shGetLockedEntry(819bbf0, “…co.uk”) failed in watchCleanUpHash()
[Tue Dec 06 16:30:04 2005] [crit] (20014)Error string not specified yet: shGetLockedEntry(819bbf0, “SERVER”) failed in watchCleanUpHash()
[Tue Dec 06 16:30:04 2005] [crit] (20014)Error string not specified yet: shGetLockedEntry(819bbf0, “…co.uk”) failed in watchCleanUpHash()
[Tue Dec 06 16:30:04 2005] [crit] (20014)Error string not specified yet: shGetLockedEntry(819bbf0, “SERVER”) failed in watchCleanUpHash()
[Tue Dec 06 16:30:04 2005] [crit] (20014)Error string not specified yet: shGetLockedEntry(819bbf0, “…co.uk”) failed in watchCleanUpHash()
[Tue Dec 06 16:30:04 2005] [crit] (20014)Error string not specified yet: shGetLockedEntry(819bbf0, “SERVER”) failed in watchCleanUpHash()
[Tue Dec 06 16:30:04 2005] [crit] (20014)Error string not specified yet: shGetLockedEntry(819bbf0, “…co.uk”) failed in watchCleanUpHash()
[Tue Dec 06 16:30:04 2005] [crit] (20014)Error string not specified yet: shGetLockedEntry(819bbf0, “SERVER”) failed in watchCleanUpHash()

…and then I get this for each individual process:

[Tue Dec 06 16:30:07 2005] [warn] child process 29168 still did not exit, sending a SIGTERM

Followed by:

[Tue Dec 06 16:30:25 2005] [notice] suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)
[Tue Dec 06 16:30:26 2005] [warn] RSA server certificate CommonName (CN) localhost.localdomain' does NOT match server name!? [Tue Dec 06 16:30:29 2005] [notice] mod_python: Creating 32 session mutexes based on 200 max processes and 0 max threads. [Tue Dec 06 16:30:30 2005] [warn] RSA server certificate CommonName (CN) localhost.localdomain’ does NOT match server name!?
[Tue Dec 06 16:30:30 2005] [notice] Digest: generating secret for digest authentication …
[Tue Dec 06 16:30:30 2005] [notice] Digest: done
[Tue Dec 06 16:30:31 2005] [notice] Apache configured – resuming normal operations
[Tue Dec 06 16:31:02 2005] [error] server reached MaxClients setting, consider raising the MaxClients setting
[Tue Dec 06 19:00:40 2005] [notice] caught SIGTERM, shutting down
[Tue Dec 06 19:00:43 2005] [notice] suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)
[Tue Dec 06 19:00:43 2005] [warn] RSA server certificate CommonName (CN) localhost.localdomain' does NOT match server name!? [Tue Dec 06 19:00:44 2005] [notice] mod_python: Creating 32 session mutexes based on 256 max processes and 0 max threads. [Tue Dec 06 19:00:45 2005] [warn] RSA server certificate CommonName (CN) localhost.localdomain’ does NOT match server name!?
[Tue Dec 06 19:00:45 2005] [notice] Digest: generating secret for digest authentication …
[Tue Dec 06 19:00:45 2005] [notice] Digest: done
[Tue Dec 06 19:00:46 2005] [notice] Apache configured – resuming normal operations
(which I’m guessing is the reboot as it’s 16secs after the last error.