CentOS 4.3 servers crash; no panic output

Afternoon Gents,

We’ve been running into an issue with a few CentOS 4.3 servers with Interworx installed on them. The servers become unresponsive at seemingly unrelated times of day and days of the week.

The servers’ TCP/IP stacks remain functional, in as far as they still respond to ICMP echo requests, but all other services are down. They are unresponsive on console, and no kernel panic messages have been written to ttyS0 or tty0. No panic messages are written to the logs, either. In fact, the logs don’t reveal anything particularly useful.

The problem doesn’t appear to be load related, as we have one server that is fairly busy, and another that is only currently running one site. Both machines are running the same kernel (2.6.9-34.EL) which is the latest release for CentOS 4.x.

As you can imagine, this is somewhat disconcerting. I’m wondering if anyone else has run into an issue similar to this; unfortunately there doesn’t seem to be much discussion of such a problem on the forums.

If further information is required, I’d be happy to provide it; however the logs seem to be fairly quiet on the issue.

Thanks,
Mike

My server was having a similar problem, except that it would stop responding to pings, and it was a problem with an older kernel, which I fixed by recompiling and patching it.
The server had to be rebooted today due to excessive load, and it’s now loaded an updated plain CentOS kernel, the same version as you’re running. If it crashes, I’ll know why… Perhaps you could try the latest version from kernel.org and see if it helps your problem :wink:

It may come to that, however I’d like to stick with the upstream kernel packages if possible.

Mike

Well if you booted to the latest version and let it run for a few days, and it does crash, at least you will have eliminated that area. If it doesn’t, then you will know that it was a kernel problem :wink:

Got a dedicated box here with a similar problem. It just stops, and I can’t find anything in the logs.
I upgraded the kernel now (today) to this one:
Linux 2.6.9-175.EL.iworx

I hope it solves the issue.

Is that an Iworx kernel built of a new offical CentOS 4.x kernel?
Im currently running Linux 2.6.9-150.EL.iworx

I believe that Iworx build it, I have 3 boxes running with the smp one in a cluster setup which works great. The other dedicated boxes have the latest original centos kernel 2.6.9-34.0.1

You can download the i386 Iworx kernels here:
http://updates.interworx.com/iworx/RPMS/cos4x/experimental/i386

I need to point out here that these are completely UNSUPPORTED by us and are used AT YOUR OWN RISK.

We too have had some internal difficulties with CentOS 4.2/3 and Socheat created a Kernel with some CentOS published patches that seems to have fixed the problem for our internal boxex and a few clients who have also had the same problems. We’ve kept relatively quiet about it and only given it out to people who have reported a specific error when the box crashed, with the stipulation that they used at your own risk, but that being said we have had no issues with any of our internal boxes or any cliet boxes.

This is not supported by us nor is it sanctioned by CentOS so some DC’s may not support it. These patches are NOT in the new CentOS kernel which was released last week as all of their official kernal releasees are recompiled RHEL kernels.

[quote=IWorx-Tim]I need to point out here that these are completely UNSUPPORTED by us and are used AT YOUR OWN RISK. [quote]

Should have mentioned something like that in my post, sorry.

EDIT: By the way, I’ll let you all know if the server did or didn’t crash the next week.

No biggie, and yes please let us know.

Gents,

Thanks for the information. We’ll be giving this a shot on a couple boxes that are otherwise particularly sad at the moment.

I’ll be sure to follow up here should things improve with the new kernel.

Cheers,
Mike

Server didn’t crash until now. I’ll keep you guys updated when I got a week uptime here.

[B]Hi all,

I’ve install the kernel 2.6.9-175 and my server crash again with no kernel panic and no others errors, if you have any solution ? Else I install an other linux who’s supported by iworx CP

Thanks in advance for your answers.
[/B]

System Uptime: 7 days 2 hours 43 minutes
And counting… that’s better then the 2-3 days uptime the server first had :smiley:

Arf :/, I’m not lucky with linux :frowning: … I’ll migrate to anothers linux …

What kind of box is it? (CPU, RAM etc.)

[B]The configuration of the box is :

  • CPU : AMD Athlon 2200+ (1,8Ghz)

  • Motherboard : Asus A7N8x-x

  • RAM : 1 GB

  • HD : 2 x 40GB Maxtor “Diamond MAX 8+” IDE (Raid 1) and 1 x 20GB IBM IDE (Backup)

I Think I have a good box for the panel and CentOS … no ?

I’m not a hosting enterprise …
[/B]

Should be OK.

Check your hard drives, I know that the Maxtor 8 and 9 series crash a lot (When they started to make these in China).

You can also check the temp. of the CPU, North and South bridge.

Also check if the time is not going to fast (sync it and look what the time is a few hours later and see if it’s till in sync.). I have had this with some Centos Boxes.

EDIT: Changed CP in CPU

OK, thanks for you help WebXtra, I’ll test this one

up 18 days, 11:40

The new kernel fixed the problem for sure here. (well that’s my conclusion) :smiley: