Just curious to know why apache restart few times alone

Hello,

In our /var/log/httpd/error_log we have a SIGTERM that cause apache to be restarded few times a day and some times few times a 10mn

example restarted 5 times in 5 minutes

[Fri Apr 07 19:39:55 2006] [notice] caught SIGTERM, shutting down
[Fri Apr 07 19:39:56 2006] [notice] suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)
[Fri Apr 07 19:39:56 2006] [warn] RSA server certificate CommonName (CN) localhost.localdomain' does NOT match server name!? [Fri Apr 07 19:39:56 2006] [notice] mod_security/1.9.2 configured - Apache [Fri Apr 07 19:39:56 2006] [notice] Digest: generating secret for digest authentication ... [Fri Apr 07 19:39:56 2006] [notice] Digest: done [Fri Apr 07 19:39:57 2006] [warn] RSA server certificate CommonName (CN) localhost.localdomain’ does NOT match server name!?
[Fri Apr 07 19:39:57 2006] [notice] NOYB configured – resuming normal operations
[Fri Apr 07 19:40:06 2006] [notice] caught SIGTERM, shutting down
[Fri Apr 07 19:40:07 2006] [notice] suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)
[Fri Apr 07 19:40:07 2006] [warn] RSA server certificate CommonName (CN) localhost.localdomain' does NOT match server name!? [Fri Apr 07 19:40:07 2006] [notice] mod_security/1.9.2 configured - Apache [Fri Apr 07 19:40:07 2006] [notice] Digest: generating secret for digest authentication ... [Fri Apr 07 19:40:07 2006] [notice] Digest: done [Fri Apr 07 19:40:08 2006] [warn] RSA server certificate CommonName (CN) localhost.localdomain’ does NOT match server name!?
[Fri Apr 07 19:40:08 2006] [notice] NOYB configured – resuming normal operations
[Fri Apr 07 19:40:29 2006] [notice] caught SIGTERM, shutting down
[Fri Apr 07 19:40:30 2006] [notice] suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)
[Fri Apr 07 19:40:30 2006] [warn] RSA server certificate CommonName (CN) localhost.localdomain' does NOT match server name!? [Fri Apr 07 19:40:30 2006] [notice] mod_security/1.9.2 configured - Apache [Fri Apr 07 19:40:30 2006] [notice] Digest: generating secret for digest authentication ... [Fri Apr 07 19:40:30 2006] [notice] Digest: done [Fri Apr 07 19:40:31 2006] [warn] RSA server certificate CommonName (CN) localhost.localdomain’ does NOT match server name!?
[Fri Apr 07 19:40:31 2006] [notice] NOYB configured – resuming normal operations
[Fri Apr 07 19:40:38 2006] [notice] caught SIGTERM, shutting down
[Fri Apr 07 19:40:41 2006] [notice] suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)
[Fri Apr 07 19:40:41 2006] [warn] RSA server certificate CommonName (CN) localhost.localdomain' does NOT match server name!? [Fri Apr 07 19:40:41 2006] [notice] mod_security/1.9.2 configured - Apache [Fri Apr 07 19:40:41 2006] [notice] Digest: generating secret for digest authentication ... [Fri Apr 07 19:40:41 2006] [notice] Digest: done [Fri Apr 07 19:40:42 2006] [warn] RSA server certificate CommonName (CN) localhost.localdomain’ does NOT match server name!?
[Fri Apr 07 19:40:42 2006] [notice] NOYB configured – resuming normal operations
[Fri Apr 07 19:44:25 2006] [notice] caught SIGTERM, shutting down
[Fri Apr 07 19:44:26 2006] [notice] suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)
[Fri Apr 07 19:44:26 2006] [warn] RSA server certificate CommonName (CN) localhost.localdomain' does NOT match server name!? [Fri Apr 07 19:44:26 2006] [notice] mod_security/1.9.2 configured - Apache [Fri Apr 07 19:44:26 2006] [notice] Digest: generating secret for digest authentication ... [Fri Apr 07 19:44:26 2006] [notice] Digest: done [Fri Apr 07 19:44:27 2006] [warn] RSA server certificate CommonName (CN) localhost.localdomain’ does NOT match server name!?
[Fri Apr 07 19:44:27 2006] [notice] NOYB configured – resuming normal operations

We absolutly sure we didn’t perform a manual restart :slight_smile:

So what could done these SIGTERM ? apache itself ? iworx ? why ?

Thanks

Pascal

is someone running a cgi proxy script on that server?

Hi

Well well well…

Normally no, but how to be sure ?

We check sometimes the suexec log and have only have
pg-bannieres-pro
pg-mlpro
pg-recherche
cgi scripts running on this box

Pascal

In fact from suexec.log we have

># cat /var/log/httpd/suexec.log | awk ‘{print$8}’ | sort | uniq

formmail-vf.pl
pg-bannierespro.cgi
pg-mlpro-admin.cgi
pg-mlpro.cgi
pg-recherche.p

l

Well we didn’t check in these script if they are renamed but we might do it

Pascal

By seeing these scripts it’s giving an idea.

Since few weeks we have a very strange issue. All services of the server are bringed to not answer (ssh, hhtp, ftp, even cron). Logs don’t tell anything special. Cpu usage is quiet good (less 10%) and same for load average (less than 0.5).

We first thought it might be a firewall problem, but now when I think about this issue I don’t see why crond could be bring down by a firewall issue, so… I think we have to re-analyse this issue again.

So here comes these CGI scripts. I think that a mlicious cgi script might cause this issue.

For example, yesterday our box last answered was at 21:58:41 (last log in messages).

give a look at what time these CGI scripts are running :

~# cat /var/log/httpd/suexec.log | grep “2006-04-06 21:5” | awk ‘{print$1" “$2” “$4” “$6” "$8}’ | tr “[” " " | sort | uniq

2006-04-06 21:50:20]: (xxx/xxx) (xxx/xxx)pg-mlpro-admin.cgi
2006-04-06 21:50:21]: (xxx/xxx) (xxx/xxx)pg-mlpro-admin.cgi
2006-04-06 21:51:14]: (xxx/xxx) (xxx/xxx) pg-mlpro-admin.cgi
2006-04-06 21:51:16]: (xxx/xxx) (xxx/xxx) pg-mlpro-admin.cgi
2006-04-06 21:51:53]: (xxx/xxx) (xxx/xxx) pg-mlpro-admin.cgi
2006-04-06 21:51:56]: (xxx/xxx) (xxx/xxx) pg-mlpro-admin.cgi
2006-04-06 21:52:02]: (xxx/xxx) (xxx/xxx) pg-mlpro-admin.cgi
2006-04-06 21:52:54]: (xxx/xxx) (xxx/xxx) pg-mlpro-admin.cgi
2006-04-06 21:52:57]: (xxx/xxx) (xxx/xxx) pg-mlpro-admin.cgi
2006-04-06 21:53:24]: (xxx/xxx) (xxx/xxx) pg-mlpro-admin.cgi
2006-04-06 21:53:28]: (xxx/xxx) (xxx/xxx) pg-mlpro-admin.cgi
2006-04-06 21:53:47]: (xxx/xxx) (xxx/xxx) pg-mlpro-admin.cgi
2006-04-06 21:53:50]: (xxx/xxx) (xxx/xxx) pg-mlpro-admin.cgi
2006-04-06 21:54:14]: (xxx/xxx) (xxx/xxx) pg-mlpro-admin.cgi
2006-04-06 21:54:16]: (xxx/xxx) (xxx/xxx) pg-mlpro-admin.cgi
2006-04-06 21:54:38]: (xxx/xxx) (xxx/xxx) pg-mlpro-admin.cgi
2006-04-06 21:54:41]: (xxx/xxx) (xxx/xxx) pg-mlpro-admin.cgi
2006-04-06 21:55:01]: (xxx/xxx) (xxx/xxx) pg-mlpro-admin.cgi
2006-04-06 21:55:03]: (xxx/xxx) (xxx/xxx) pg-mlpro-admin.cgi
2006-04-06 21:55:26]: (xxx/xxx) (xxx/xxx) pg-mlpro-admin.cgi
2006-04-06 21:55:28]: (xxx/xxx) (xxx/xxx) pg-mlpro-admin.cgi
2006-04-06 21:55:57]: (xxx/xxx) (xxx/xxx) pg-mlpro-admin.cgi
2006-04-06 21:55:59]: (xxx/xxx) (xxx/xxx) pg-mlpro-admin.cgi
2006-04-06 21:56:18]: (xxx/xxx) (xxx/xxx) pg-mlpro-admin.cgi
2006-04-06 21:56:20]: (xxx/xxx) (xxx/xxx) pg-mlpro-admin.cgi
2006-04-06 21:56:42]: (xxx/xxx) (xxx/xxx) pg-mlpro-admin.cgi
2006-04-06 21:56:45]: (xxx/xxx) (xxx/xxx) pg-mlpro-admin.cgi
2006-04-06 21:57:03]: (xxx/xxx) (xxx/xxx) pg-mlpro-admin.cgi
2006-04-06 21:57:06]: (xxx/xxx) (xxx/xxx) pg-mlpro-admin.cgi
2006-04-06 21:57:25]: (xxx/xxx) (xxx/xxx) pg-mlpro-admin.cgi
2006-04-06 21:57:31]: (xxx/xxx) (xxx/xxx) pg-mlpro-admin.cgi
2006-04-06 21:57:49]: (xxx/xxx) (xxx/xxx) pg-mlpro-admin.cgi
2006-04-06 21:57:51]: (xxx/xxx) (xxx/xxx) pg-mlpro-admin.cgi
2006-04-06 21:58:10]: (xxx/xxx) (xxx/xxx) pg-mlpro-admin.cgi
2006-04-06 21:58:13]: (xxx/xxx) (xxx/xxx) pg-mlpro-admin.cgi
2006-04-06 21:58:31]: (xxx/xxx) (xxx/xxx) pg-mlpro-admin.cgi
2006-04-06 21:58:33]: (xxx/xxx) (xxx/xxx) pg-mlpro-admin.cgi

ps UID and GID removed for security reasons

well well well, it’s strange. The same script has been runned 37 times in less than 10mn and the last run is 15 secondes before our services’ box stop answered.

I also remember that we have a qmail-smtpd recurent segfault. This script is mailing list management script. Maybe it could be it the source of our issue.

I’ll investigate.

In the mean time, is there a way to limit this script (ulimit ? ) to be sure it doesn^t bring the server down ?

Do you think we are on the good way ?

Any thoughts, experience of this kind of problem ?

Thanks

pascal

I’ve had issues with proxy scripts that maxed out the suexec.log file. Go check your logfile size, see if it’s maxed out before being rotated. Either increase the rotation or /dev/null it, or determine what you want to do about that offending script (Checking google shows it’s some sort of mass mailing CGI script, possibly for spam).

You can limit CPU usage on a per-vhost basis for cgi’s. Just use RLimitCPU (http://httpd.apache.org/docs/2.0/mod/core.html#rlimitcpu). There’s a RLimitMem as well.

Chris

We already use RlimitCPU and RlimitMem and also RLimitRequestBody but for ALL httpd requests (in httpd.conf)

RLimitMEM 101145600
RLimitCPU 120
LimitRequestBody 8192000

Hmm does it why the SIGTERM is sent ? if the server reach the mem limit or cpu does it send a sigterm ?

Pascal

well well well, this night we found somethings very strange.

All our log files have been log rotated, but the current version is the .1 version (for example messages.1 and not messages continue growing). But one and only one have rotated well and it is suexec.log.
Only this file use now the suecex.log and not the suexec.log.1

The other strange thing is thre is a gap in the log of this file. The suexec.log.1 stop at 3:00 AM and the suexec.log begin at 5:00 AM.

It looks like FusionHosting idea was not so bad aqt all :wink:

So FusionHosting which CGI was it for you ?

We’ll try to find which one might done this, but it’s very hard, we have more than hundred client on this box and a lot of them use CGI (not all of course)

Any idea ?

Thanks

Pascal