10

I have a Hibernate, Spring, Debian, Tomcat, MySql stack on a Linode server in production with some clients. Its a Spring-Multitenant application that hosts webpages for about 30 clients.

The applications starts fine, then after a while, Im getting this error:

java.net.SocketException: Too many open files
at java.net.PlainSocketImpl.socketAccept(Native Method)
at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:390)
at java.net.ServerSocket.implAccept(ServerSocket.java:453)
at java.net.ServerSocket.accept(ServerSocket.java:421)
at org.apache.tomcat.util.net.DefaultServerSocketFactory.acceptSocket(DefaultServerSocketFactory.java:60)
at org.apache.tomcat.util.net.JIoEndpoint$Acceptor.run(JIoEndpoint.java:216)
at java.lang.Thread.run(Thread.java:662)

Before this error is thrown however, nagios alerts me that pings to the server stop responding.

Previously, I had nginx as a proxy, and was getting this nginx errors per request instead, and had to restart tomcat anyway:

2014/04/21 12:31:28 [error] 2259#0: *2441630 no live upstreams while connecting to upstream, client: 66.249.64.115, server: abril, request: "GET /catalog.do?op=requestPage&selectedPage=-195&category=2&offSet=-197&page=-193&searchBox= HTTP/1.1", upstream: "http://appcluster/catalog.do?op=requestPage&selectedPage=-195&category=2&offSet=-197&page=-193&searchBox=", host: "www.anabocafe.com"
2014/04/21 12:31:40 [error] 2259#0: *2441641 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 200.74.195.61, server: abril, request: "GET / HTTP/1.1", upstream: "http://127.0.0.1:8080/", host: "www.oli-med.com"

This is my server.xml Connector configuration:

<Connector port="80" protocol="HTTP/1.1"
                maxHttpHeaderSize="8192"
               maxThreads="500" minSpareThreads="250"
               enableLookups="false" redirectPort="8443" acceptCount="100"
               connectionTimeout="20000" disableUploadTimeout="true"
               acceptorThreadCount="2" />

I tried changing the ulimit using this tutorial I was able to change the hard limit for opened file descriptors for the user running tomcat, but it did not fix the problem, the application still hangs.

The last time that I had to restart the server, It ran for about 3 hours, I had these values for socked opened connections:

lsof -p TOMCAT_PID | wc -l
632 (more or less!! i did not write the exact number)

This numbers suddenly starts growing.

I have some applications very similar to this one on other servers, the difference is that they are a Stand Alone version and this is a Multitenant architecture, I notice that in this app Im getting these kind of socket connections, which don't occur in the Stand Alone version of any of the other installations:

java    11506 root  646u  IPv6 136862      0t0     TCP lixxx-xxx.members.linode.com:www->180.76.6.16:49545 (ESTABLISHED)
java    11506 root  647u  IPv6 136873      0t0     TCP lixxx-xxx.members.linode.com:www->50.31.164.139:37734 (CLOSE_WAIT)
java    11506 root  648u  IPv6 135889      0t0     TCP lixxx-xxx.members.linode.com:www->ec2-54-247-188-179.eu-west-1.compute.amazonaws.com:28335 (CLOSE_WAIT)
java    11506 root  649u  IPv6 136882      0t0     TCP lixxx-xxx.members.linode.com:www->ec2-54-251-34-67.ap-southeast-1.compute.amazonaws.com:19023 (CLOSE_WAIT)
java    11506 root  650u  IPv6 136884      0t0     TCP lixxx-xxx.members.linode.com:www->crawl-66-249-75-113.googlebot.com:39665 (ESTABLISHED)
java    11506 root  651u  IPv6 136886      0t0     TCP lixxx-xxx.members.linode.com:www->190.97.240.116.viginet.com.ve:1391 (ESTABLISHED)
java    11506 root  652u  IPv6 136887      0t0     TCP lixxx-xxx.members.linode.com:www->ec2-50-112-95-211.us-west-2.compute.amazonaws.com:19345 (ESTABLISHED)
java    11506 root  653u  IPv6 136889      0t0     TCP lixxx-xxx.members.linode.com:www->ec2-54-248-250-232.ap-northeast-1.compute.amazonaws.com:51153 (ESTABLISHED)
java    11506 root  654u  IPv6 136897      0t0     TCP lixxx-xxx.members.linode.com:www->baiduspider-180-76-5-149.crawl.baidu.com:31768 (ESTABLISHED)
java    11506 root  655u  IPv6 136898      0t0     TCP lixxx-xxx.members.linode.com:www->msnbot-157-55-32-60.search.msn.com:35100 (ESTABLISHED)
java    11506 root  656u  IPv6 136900      0t0     TCP lixxx-xxx.members.linode.com:www->50.31.164.139:47511 (ESTABLISHED)
java    11506 root  657u  IPv6 135924      0t0     TCP lixxx-xxx.members.linode.com:www->ec2-184-73-237-85.compute-1.amazonaws.com:28206 (ESTABLISHED)

They are some kind of automated connections I guess.

So my question is:

How can I determine if the problem is because of my code, server, or some kind of attack and which approach would you recommend to figure this out ?

Thank you in Advance :)

Ernest
  • 962
  • 3
  • 12
  • 28

3 Answers3

7

Ok, it turns out that the problem was the jdbc connection settings, has maxActive set to 20 connections, I changed the limit to 200 and the problem stopped.

The way I figured this was the problem was thanks to appdynamics.com´s wonderful tool, which lets you check a great deal of metrics in the ApplicationInfraestructurePerformance metrics.

Also, found this wonderful article about the subject which helped me tune my app:

http://www.tomcatexpert.com/blog/2010/04/01/configuring-jdbc-pool-high-concurrency

the oficial documentation also helped:

https://tomcat.apache.org/tomcat-7.0-doc/jdbc-pool.html.

I guess that the arriving connections started a query which collapsed the server response ability first, and afterwards filled the OS socket limits, in linux, open socket are open files. I hope this helps someone !

EDIT

Hello! This solution solved the issue in the short term, but another error regarding the JDBC connection appeared, the application was not closing the connections, I opened and solved a ticket regarding that issue here

Community
  • 1
  • 1
Ernest
  • 962
  • 3
  • 12
  • 28
2

Have you checked your ulimit for the user that is running tomcat? Linux has a limit of 1024 open files by default. More on How do I change the number of open files limit in Linux?

There is a possibility you have too many connections in the configs or for some reason you are not properly closing some IO streams(higly unlikely).

I would approach it by increasing the ulimit and then run some load test to see what is spiking the file usage.

Community
  • 1
  • 1
Mite Mitreski
  • 3,596
  • 3
  • 29
  • 39
  • Hello Mite, thank you for the answer... I did change the ulimit, I wrote about it in the original post, after the server.xml Connector description. I raised it to 65535. I did not get that error, but the Tomcat stoped responding, and the opened files were rising. I probably would of got that error if I had let it alone, but had to reboot. I tried load tests with jmeter, If I put a lot of request per second it slows down, but Im not sure how to detect whats causing it. – Ernest Apr 23 '14 at 22:55
  • when you run lsof -p TOMCAT_PID what are the files that are opened, connections or something else ? – Mite Mitreski Apr 24 '14 at 21:53
  • Hello Mite, mostly connections, like from robots and webcrawlers I think. I just made a script that will be watching when this happens and logs the output of -p TOMCAT_PID, so I can give you a precise answer. Thank you again for answering. I will update with more info soon – Ernest Apr 25 '14 at 23:41
0

a bit late, but maybe a help/hint for anyone struggling with this issue. we had the same strange problem every now and then (our tomcat service is restarted every day at night (which cleans the open handles), so the error was not happening that often).
We use a apache proxy with ajp protocol. The problem was a wrong protocol implementation.
Our connector config is now the following:

<Connector
        port="8009"
        protocol="org.apache.coyote.ajp.AjpNioProtocol"
        connectionTimeout="1800000"

        acceptorThreadCount="2"
        maxThreads="400"
        maxConnections="8192"
        asyncTimeout="20000"
        acceptCount="200"
        minSpareThreads="40"
        compression="on"
        compressableMimeType="text/html,text/xml,text/plain"

        enableLookups="false"
        URIEncoding="UTF-8"
        redirectPort="8443" />

Please mind this: protocol="org.apache.coyote.ajp.AjpNioProtocol"
This implementation did the trick for us - no more open file handles. Further information can be found here: https://itellity.wordpress.com/2013/07/12/getting-rid-of-close_waits-in-tomcat-67-with-mod_proxy/
I hope this helps someone. Have a nice day!

Thomas Lang
  • 1,285
  • 17
  • 35