38

Tomcat is running a webapp under Windows. After a few days (under very low load), the exception mentioned in the title starts to appear in the logs, no new connections can be established from that point on, the only fix is then to reboot the server.

Environment:

  • Latest Tomcat 6
  • Windows Server 2008 R2
  • JDK 6 update 30
  • SQL Server 2008
  • Kerberos authentication

Evidence collected so far:

  • netstat shows no excessive amount of connections
  • ProcessExplorer shows no excessive amount of open file handles
  • system main memory usage is average
  • JVM heap usage is average
  • restarting Tomcat does not solve the problem

Open questions:

  • if we were leaking connections, shouldn't they show up in netstat?
  • shouldn't a restart of the appserver resolve the problem, because the OS should free all process resources?
  • is there a way to trace the problem to its origin? E.g. installing monitoring software, maybe something similar to lsof etc.?

I'm out of ideas, any hints appreciated!

Michael Böckling
  • 7,341
  • 6
  • 55
  • 76
  • There are lots of questions similar to yours, please do a search and check them out first. – Some programmer dude Apr 10 '12 at 12:14
  • 5
    I did, believe me. None of them helped to resolve my problem, because I'm not seeing any of the symptoms I should be seeing, and none of them contain hints how the source of the problem can be identified. – Michael Böckling Apr 10 '12 at 13:09

3 Answers3

57

The reason we got this error is a bug in Windows Server 2008 R2 / Windows 7. The kernel leaks loopback sockets due to a race condition on machines with more than one core, this patch fixes the issue: http://support.microsoft.com/kb/2577795

Michael Böckling
  • 7,341
  • 6
  • 55
  • 76
  • I perhaps just experienced this. Is this bug still around in 2014 and hotfix was not added to any update? – Esko Piirainen Aug 21 '14 at 08:17
  • 2
    We are using Windows Server 2012, and still facing this issue. Is there some other fix for that? Or the issue is completely different ? – Yadu Krishnan Feb 02 '15 at 07:40
  • @Buddy_Casino, I'm also getting this error and all my cache storage nodes are on Win 2008 R2 servers. I want to ask you how this conclusion was reached ? – george_h Feb 03 '15 at 12:27
11

I was running Alfresco Community 4.0d on Windows 7 64 bit and had the same symptoms and errors.

The problem was fixed with Microsoft's patch: "Kernel sockets leak on a multiprocessor computer that is running Windows Server 2008 R2 or Windows 7" (http://support.microsoft.com/kb/2577795) (ie. Buddy Casino's answer (see below)).

Another observation I'd like to add is that Windows connections (Internet Explorer, Remote Desktop etc) would work again about 5-10 mins after the Alfresco services were shutdown.

Alfresco is an excellent product and I was afraid I would have to scrap it. Fortunately stackoverflow came to the rescue !

Thanks again to Buddy Casino's answer.

Boo to the person who down-voted the Question.

mvanle
  • 1,847
  • 23
  • 19
  • Same issue in windows server 2012 R2. Any patch is available to fix this issue. App server(jboss) looking fine, Heap also lessthan a average. but in few hours after restarting the server, i am getting this error on server. i checked the code and tested locally, there is no leakage in running code. – ilaiya Dec 29 '17 at 13:09
1

We are seeing the same thing on a similar setup, W2008R2, Tomcat 6.0.29, Java 1.6.0.25. Restarting tomcat does not help, but restarting the server itself does, at least for a while. After the last time we started shutting down individual services and believe we have it narrowed down to either an instance of Alfresco that is also running on the server or the Backup Exec Agent services. After those services (four in total) were stopped, the applications in Tomcat started working again, although we were still seeing the buffer/connections error in the stdout log which was strange. Will need to wait for the problem to return before confirming which are the culprit, which could be anywhere from a few days to a week or more.

Any chance you are running either Alfresco or BE on your server?

J Jost
  • 11
  • 1