IWORX needs workaround to RedHat RPM bug

Bug 73097 is closed, but clearly not resolved. This affects Interworx in that the automatic updates, which call Yum (which is itself a front-end to rpm), can get bitten by this bug. In my case, updates hadn’t run for a long time. I didn’t notice this until I logged in on the CLI to install some software, but Yum hung. When doing an strace, I notced that RHL9’s latest rpm still has the problem (and from my research, Fedora may as well).


# strace -p 6283
futex(0x405bb400, FUTEX_WAIT, 0, NULL)  = -1 EINTR (Interrupted system call)

[this is where a "pkill -9 rpm" was issued at a different terminal]
--- SIGINT (Interrupt) @ 0 (0) ---
sigreturn()                             = ? (mask now [])
futex(0x405bb400, FUTEX_WAIT, 0, NULL)  = -1 EINTR (Interrupted system call)
+++ killed by SIGKILL +++

Once it’s hung like this, only a kill -9 will stop the rpm process(es). However, once a SIGKILL is sent to rpm, the database WILL be corrupted. The only fix for this is to delete the rpm databases:


# rm /var/lib/rpm/__db.*

When I discovered the problem on my server, there were 23 hung yum update processes. Perhaps a cron job could be fired to look for the problem, and kill off hung processes when found (if an rpm-related process is killed, don’t forget to delete the /var/lib/__db.* files).

Again, this is not an InterWorx bug, it’s a RedHat bug. I was just hoping that, since IWORX fires yum update automatically, a test for this could also run. If I hadn’t logged in to install a package, I fear how long it would have taken me to notice, and how long a critical update would have gone uninstalled.