long-running qmail-smtpd processes

since around the time Interworx 4.0 has been released we are struggling with SMTP availability issues on our server - SMTP service becomes unavailable - you can connect but the server never responds.

I found out that this happens because the number of qmail-smtpd processes running reaches the value specified in /var/qmail/control/concurrencyincoming.

Unfortunately, increasing this value does not help, because some qmail-smtpd processes seem to just never end. So the only effect is that it takes longer time to fill all the free “slots”.

As a temporary solution, I regularly kill all qmail-smtpd processes which are running for more than an hour, but that is just a workaround, and I would really like to find proper permanent solution.

How can it be that some qmail-smtpd processes run for hours or even days/weeks? Shouldn’t they timeout? I have not customized any timeouts and the only non-default timeout seems to be /var/qmail/control/timeoutremote which is probably preset by interworx and is set to 600.

It can be just coincidence, but is it possible that some changes in Interworx 4.0 could cause this? It seems to happen from around the time Iworx 4.0 version has been released. Should I open a support ticket?

Thanks very much for help.
Regards, M.

We’ve actually been focusing on this this week. We’ve discovered a cause and a workaround, but not exactly a “fix” yet :slight_smile:

It seems that in using TLS on port 25, an insecure connection attempt will be rejected by SMTP, but the TLS session hangs (the remote server is keeping it open), which keeps the process open. Since the SMTP session has already been rejected, it can’t timeout, and apparently the TLS session isn’t considered by the script that enforces the timeout.

The workaround is to reject TLS on the primary port. To do this, you need to manually edit /service/smtp/run, and change DENY_TLS to 1.

Thanks for reply. I’ll stay with my current workaround (regularly killing qmail-smtpd processes over one hour old from crontab), but please, let me know here when you find the fix. Thanks.