Tomcat7 stucks after some working time

Question

I am using tomcat 7.0.52 servlets server with postgresql database via Hibernate 4.3 and JPA 2.1.

Nginx proxying all requests to server by port 8080 to port 8888 of tomcat server.

Server has about 200 requests per second. After a few hours it stops responding on requests. I can't get access to tomcat7 manager page, I can't get access to servlet context. It always response request timeout error. But server still working, my sheduled services still work and has access to database.

On stucks I have 0.04-0.08% CPU usage on tomcat7 and 0.01-0.02% CPU usage on postgresql. In difference of 3-4% CPU usage on tomcat7 and 12-14% CPU usage on postgresql in normal working.

After restarting tomcat7 server it working fine again.

I think there is no problem with database, postgresql-9.3-main.log is empty but logging is enabled. I see it when I do something wrong in psql.

I think there is no problem with OutOfMemory or any other Exceptions, because is no any exceptions and errors in all log files of tomcat7 catalina.out and localhost.YYYY-MM-DD.log.

I think there is no problem with nginx because all any requests to other ports and sites working fine.

I think there is no problem with memory leaks, JAVA consume always about 700-800 MB of memory and no any peaks on stuck time.

I googled much similar problems, but nothing of this helped me.

When I change acceptorThreadCount from 1 to 2 server stucks much faster.

Looks like something stucks in accepting connections by tomcat7 server. I have no more any ideas what I missing.

JVM options:

JAVA_OPTS="-Xms1024m -Xmx2048m -XX:MaxPermSize=256m"

Tomcat7 version info:

Server version: Apache Tomcat/7.0.52 (Ubuntu)
Server built:   Jul 24 2014 08:38:51
Server number:  7.0.52.0
OS Name:        Linux
OS Version:     3.13.0-53-generic
Architecture:   amd64
JVM Version:    1.7.0_79-b14
JVM Vendor:     Oracle Corporation

Nginx config file:

worker_rlimit_nofile 8192;
worker_processes 4;
timer_resolution 100ms;
worker_priority -5;

pid /run/nginx.pid;

events {
    worker_connections 2048;
    use epoll;
}

http {
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    output_buffers 2 512k;
    client_max_body_size 150M;

    gzip on;
    gzip_min_length 1100;
    gzip_buffers 64 8k; 
    gzip_comp_level 3;
    gzip_disable "msie6";
    gzip_http_version 1.1;
    gzip_proxied any;
    gzip_types text/plain text/css application/json application/x-javascript text/xml application/xml application/xml+rss text/javascript;

    keepalive_timeout 30;
    server_tokens off;
    reset_timedout_connection on;
    limit_req_zone $binary_remote_addr zone=one:10m rate=1r/s;

    types_hash_max_size 2048;
    server_names_hash_bucket_size 64;
    server_names_hash_max_size 2056;

    include /etc/nginx/mime.types;
    default_type application/octet-stream;

    include /etc/nginx/conf.d/*.conf;
    include /etc/nginx/sites-enabled/*;
    include blockips.conf;
}

Nginx server config:

server {
    listen 8080;
    server_name <my_ip>;

    proxy_headers_hash_max_size 512;
    proxy_headers_hash_bucket_size 64;

    location / {
        proxy_set_header X-Forwarded-For $http_x_real_ip;
        #proxy_set_header X-NginX-Proxy true;

        proxy_pass         http://127.0.0.1:8888/; 
        proxy_redirect     off;
    }
}

Connector config:

port="8888" 
protocol="org.apache.coyote.http11.Http11NioProtocol"
connectionTimeout="20000"
acceptorThreadCount="1"
maxThreads="500"
URIEncoding="UTF-8"
redirectPort="8443"

Thank you in advance.

Update

The problem was solved. I found right solution here https://stackoverflow.com/a/3731978/7289901

The hibernate was configured wrong because idle_test_periods was higher than timeout. After fixing these variables on right values, server became very stable.

Update 2

Full cofig of hibernate which allowed me to find out cause of problem:

<property name="hibernate.c3p0.acquire_increment">3</property>
<property name="hibernate.c3p0.acquireRetryAttempts">3</property>
<property name="hibernate.c3p0.acquireRetryDelay">250</property>
<property name="hibernate.c3p0.idle_test_period">10</property>
<property name="hibernate.c3p0.min_size">0</property>
<property name="hibernate.c3p0.max_size">50</property>
<property name="hibernate.c3p0.max_statements">50</property>
<property name="hibernate.c3p0.timeout">30</property>
<property name="hibernate.c3p0.checkoutTimeout">500</property>
<property name="hibernate.c3p0.debugUnreturnedConnectionStackTraces">true</property>
<property name="hibernate.c3p0.unreturnedConnectionTimeout">30</property>
<property name="hibernate.c3p0.numHelperThreads">5</property>

You need to be able to duplicate in a non production environment. Look at your code first. Tomcat works. Hookup a profiler. — Romain Hippeau, Dec 13 '16 at 12:35
I accept that I could mistake in code. But I have logging of all incoming requests in beginning of doGet and doPost methods and requests stop logging. I would understand if CPU usage grows up to 100%, but server not even try process requests or log it. And CPU usage drops to minimum. And like I said, tomcat7 admin not responding too. That is strange for me. — Vladimir, Dec 13 '16 at 12:46
any particular reason you are running Nginx in front of Tomcat ? — Klaus Groenbaek, Dec 13 '16 at 14:44
I know about configuration of nginx much more than about configuration of tomcat7. So that way is most reliable for me. It allows me to manage access to the server more predictable and secure. I would like to believe it at least. :) — Vladimir, Dec 14 '16 at 09:23

score 0 · Accepted Answer · answered Dec 13 '16 at 14:38

My initial guess is that something is wrong in your JPA code. You start out with low CPU on both Tomcat and DB, and end up with 3-4% on Tomcat and 12-14% on DB server.

If your Tomcat application is stateless, the scaling is practically linear, and even if you store data in the HttpSession there is little overhead until you start clustering Tomcat.

DB's also scale fairly well, provided you don't perform full table scans, but have the appropriate indexes. Perhaps you should enable slow query logging on postgresql (log_min_duration_statement property) to see if there are individual queries that have long runtimes.

If you cant connect with the Tomcat manager, it is probably because all http acceptors are used. But you should still be able to connect with JVisualVM. JViaualVM has a CPU Sampler, if you start it you should be able to see where the time is spent. One problem here is that you cann't just look at the CPU time (since most of the CPU is used on DB), and if you look at self time, every previous step in the call-stack will rate higher than your code (and tomcat and spring typically add 20ish stackframes).

You could try to do a thread dump, and check what tomcats http threads are doing (this is basically what the CPU Sampler does), so you can see where it is stuck.

CPU Sampling and thread dump should give you ideas on where to focus your efforts. My guess is that it is JPA related.

It is possible to write code with JPA that uses the database in a very bad way. Often times lazy loaded collections is a good place to start. If you have an ER model with Company>-Employee>-phone (1-N, 1-N) and you want to print all phone numbers of employees in a company, you could start with the company and loop through the employee collection and for each employee loop through the phone numbers. This will result in 1 + N queries, since you need one query to load the employees and a query for each employee to load the phone numbers. A better solution is to select the data using a fetch join query, so the database only does a single query loading all employees and phone numbers in a single operation.

Another common mistake is to add data to a lazy loaded collection, since this causes JPA to load all the data in the collection first.

Since you are using Spring, your entity manager is probably managed (and transaction scoped), so you probably don't have problems with data accumulation in you persistence context.

If you have queried that are read only, you should check your JPA provider to see if there is a @QueryHint that can optimise this. By default JPA has to keep a copy of every object loaded into a persistence context, so it can check if any modifications have been made when a transaction is committed, this process can take time (and serves no purpose for read only queries).

You can enable query logging for JPA, but it tends to produce a lot of output.

Hope you find the source.

I found solution of my problem here: http://stackoverflow.com/a/3731978/7289901 The hibernate was configured wrong because idle_test_periods was higher than timeout. After fixing these variables on right values, server became very stable. Thank you for help! You guide me in right direction! — Vladimir, Dec 14 '16 at 13:10

Tomcat7 stucks after some working time

1 Answers1