We have haproxy 1.3.26 hosted on CentOS 5.9 machine having 2.13 GHz Intel Xeon processor which is acting as a http & tcp load balancer for numerous services, serving a peak throughput of ~2000 requests/second. It has been running fine for 2 years but gradually both traffic and number of services are increasing.
Off late we've observed that even after reload old haproxy process remains. On further investigation we found that old process has numerous connections in TIME_WAIT state. We also saw that netstat
and lsof
were taking a long long time. On referring http://agiletesting.blogspot.in/2013/07/the-mystery-of-stale-haproxy-processes.html we introduced option forceclose
but it was messing up with various monitoring service hence reverted it. On further digging we realised that in /proc/net/sockstat
close to 200K sockets are in tw
(TIME_WAIT
) state which is surprising as in /etc/haproxy/haproxy.cfg
maxconn
has been specified as 31000 and ulimit-n
as 64000. We had timeout server
and timeout client
as 300s
which we changed to 30s
but not much use.
Now the doubts are :-
- Whether such a high number of TIME_WAITs is acceptable. If yes whats a number after which we should be worried. Looking at What is the cost of many TIME_WAIT on the server side? and Setting TIME_WAIT TCP seems there shouldn't be any issue.
- How to decrease these TIME_WAITs
- Are there any alternatives to netstat and lsof which will perform fine even if there are very high number of TIME_WAITs