Anyone running a Opteron 165/170 with RHES 4 64bit Clustered Iworx notice any issues?

[SIZE=2]Iworx (chris) manually set this up and they seem fine idle. I setup a 2 node cluster and was using least connections with persistance and the system running the app (a very CPU intensive CGI script) kept locking up requiring manual reboots, or services like httpd would eventually die. Seems like memory usage would skyrocket. In general system resources just got too bogged down.[/SIZE]
[SIZE=2][/SIZE]
[SIZE=2]I have since switched it over to Round Robin, which isn’t ideal, but still seems to work. I’ve still had one server lockup without any log data to go on, but still it’s probably once a week and not once every 48 hours now.[/SIZE]
[SIZE=2][/SIZE]
[SIZE=2]I’m just wondering if this may be a 64bit or RHES4 / Iworx & clustering issue at all? I don’t doubt the CGI app under heavy load is an issue either and I’m continuing to watch that.[/SIZE]
[SIZE=2][/SIZE]
[SIZE=2]Also is it possible to do RR with persistance? Seems like a oxymoron, or it may bring back the same problems as before with little benefit, but just curious if it can be done.[/SIZE]

Are the boxes flat out crashing or are the reboots just to flush things out?

And regarding RR+persistence it’s absolutely doable.

Chris

They were flat out locking up to the point where I couldn’t SSH into them, and even when I tried going through a VPN with IPMI console access that wasn’t always up/available with the box responding.

I did catch two of the CGI processes apparently stuck once eating up a lot of memory/cpu, but that was only once.

Other times when I was doing the least connect LB memory usage was always high to the point of Apache shutting itself down and restarting to clear memory, and the httpd connections were always high in the 250-300 range. I’ve tried everything tweaking httpd.conf settings, now connections stay in the 100 range with about the same amount of users, but the CGI process not being killed when memory use is high is worrysome.

It just seems wierd that least connection with persistance seemed to put so much extra demand on the servers memory or stability unless I’m mistaken in the cause here.

I doubt it’s a load issue when it comes to the LC vs. RR. For our CentOS boxes we’ve put some patches on from the LVS folks b/c of problems they had with locking issues. This may be the cause on your box as well since RHEL nad CentOS are so similar. I can have socheat tell you which patch(es) he installed if you’d like to patch your RHEL kernel as well.

Chris

That’d be great.

I am planning to test another 2 licenses on a pair of Dell Xeon servers with CentOS 64bit too.

Is the installer good to go for that or is a manual install still necessary?

4.3 x86_64 is still fussy install-wise.

[SIZE=2]Hmm ok, I think I’ll go ahead with a 32bit install then. Is it possible for me to use yum to uninstall Apache 2, and compile or have yum install Apache 1 with Interworx? Or is Apache2 required?[/SIZE]

Apache 1 was supported but the official support has waned FusionHosting. And without doing some regression testing I’d say stick with Apache 2.

Chris

An update for the above issues I had.

I caught some abuse, installed some apache mods and curbed that. I also removed the Cluster manager from the “cluster”, now only the node server is handling http request. I’m waiting on a 3rd server to try a cluster again to see how performance fares. It’s been as stable as could be expected with the load its under.

So far from playing with the CM and Node in a cluster together with high demand, the CM just can’t handle it’s role and heavy web traffic at the same time, it’s just asking for trouble.