Server unstable

[B]Hi All,

I’ve problem with my server. At the end of a certain time my server crash. I don’t know which process makes crash it (I can’t connect in SSH and I must reboot it). In which log can I see that ?

Thanks for your help :slight_smile:

[/B]

What Operating System? What Kernel version? Is there anything in the log files in /var/log ?

[B]Hi Fr3d,

I’ve CentOS 4.3 with 2.6.9, In secure log I have just this error

May 22 13:29:20 hades sshd[2980]: Server listening on :: port 22.
May 22 13:29:20 hades sshd[2980]: error: Bind to port 22 on 0.0.0.0 failed: Address already in use.

There are no others error :/.

Any solutions ? Thanks :)[/B]

Ahhh, I’m guessing you’re having the same problem as I once had.

When you have to reboot the box, does it say anything about Kernel panics on the screen?

If you are getting kernel panics, I suggest reading this page, and applying the patches Socheat posted: http://bugs.centos.org/view.php?id=1201

Applying those two patches fixed it for my box… Now it has 24 days of uptime, and counting :slight_smile:

[B]Thanks Fr3d but there are no kernel panic error :(. I test your solution :slight_smile:

–edit : When I boot, The IPVS failed, would that come from it ? –
[/B]

IPVS?

I really don’t know much about this problem… I only knew about the patches as the datacenter sent me the link…

[B]Ok thanks Fr3d, if anyone have an idea :slight_smile: ?

May 22 13:29:27 hades ipvsadm: Clearing the current IPVS table: succeeded
May 22 13:29:27 hades ipvsadm: Applying IPVS configuration failed

[/B]

Unless you have terminal access (which mose remote host don’t offer) you would never see a kernal panic. Maybe if possible contact your DC and let them know the problem and ask them next time it happens to console your box and see what it says.

But I had the problem of the box randomly crashing and it was the kernel problem with CentOS. Is your box 64 bit?

I can access to my server and I know there are no kernel panic error. My processor is an 32bits

What kind of “access”???

SSH is not the same as console. Once kernel panics then you can’t SSH in so. There will be nothing in the log either.

Put is this way… Can you “touch” your server physically with your hand? Could you plug something into it if you wanted to?

Yes I can “touch” the server, I access to the server area. Sorry for my bad english … :o

Sounds like you may have something else trying to use port 22

netstat -lnp | grep 22

What’s the output of that command?

It should look something like this

[root@iworx ~]# netstat -lnp | grep 22
tcp 0 0 :::22 :::* LISTEN 4486/sshd
[root@iworx ~]#

Hello,
[SIZE=2][/SIZE]
[SIZE=2]All I can say is my 2.6.9 kernel random crashes gone away updating to 2.6.16.x.[/SIZE]
[SIZE=2][/SIZE]
[SIZE=2]I managed to do it myself and everything’s fine now after some troubleshooting recompiling the kernel and the such, but I would recommend that you hire someone to do it.[/SIZE]
[SIZE=2][/SIZE]
[SIZE=2]Regards,[/SIZE]

[B]Hi All,

Tim >> The result for the command is :[/B]

[root@hades gimly]# netstat -lnp | grep 22
tcp 0 0 127.0.0.1:783 0.0.0.0:* LIST EN 3122/spamd -d -q -x
tcp 0 0 :::22 :::* LIST EN 2973/sshd
unix 2 [ ACC ] STREAM LISTENING 5203 2275/acpid /var/run/acpid.socket
[root@hades gimly]#

juangake >> If I understand, I must upgrade my kernel to 2.6.13 and the crash stop ?

Gimly, I don’t know if 2.6.13.x will resolve the issue.

For me, worked an update to the latest kernel that day. It was 2.6.16.11. I was suffering random ‘freezes’ on a mysql dedicated box. I had to reboot that box every two or three days, or even twice a day. Now it’s going along 25 days of uptime and counting.

You’ll see on www.kernel.org that the current stable is 2.6.16.x

Regards,

[B]Ok juangake.

All >> Is it possible to use yum for this manipulation ? I’ve search on centos website but I’ve no result :frowning: …[/B]

I just confirmed with another staffer and we agree that the out put you posted for netstat -lnp looks normal

There are two possibilities: Kernal Panick or a hacker.

If it’s a kernal panick an update could fix it or not. Yum will only give you officially supported CentOS kernals. If Auto Update is ON on NodeWorx then you already have the most current official kernal build.

If you’ve never done a kernal update manually before you should either do so sitting on front of the machine (not SSH or scerial console) or hire an admin to do it for you.

It’s also possible you’ve been hacked – run top and see if you have a high load or have any abnormal processes running (for example a process running as perl using 49% or your available memory or something euwually odd. Poke around the various temp directories /tmp /var/tmp and use the ls -la command to get directories looking for directories beginning with periods (making them invisible to the dir command) or anything unusual.

I should note that running netstat -lnp and top imediately after a reboot or when the box is normal is less likely to show the problem than if you run it whenthings are acting wierd. If you do have a runaway process rebooting effectively kills the process but it does not prevent it from happening again (as you’ve seen). I would keep an eye on things and see try running both of these when you are experienceing problems to see what you find.

If you haven’t already run a rootkit check:
http://www.rootkit.nl/projects/rootkit_hunter.html

[B]Ok Tim, thanks for your help. Thanks justec I test it :).

–edit : I launched rootkit hunter and I don’t have find root kit. –
–edit : In /var/log/messages, I can see several attempts connection in ssh with root login (the login failed because my ssh reject root login). I blocked IP’s attacker, is it possible to block usermask (exemple : F1296D51.w**-***.abo.wanadoo.fr) ? –
[/B]