3

I am using JBoss AS 4.2.3 along with the seam framework. My CPU usage increases as the number of users increase and it hits 99% for just 80 users. We also use Hibernate, EJB3 and Apache with mod_jk for loadbalancing.

When I took the thread dump all the runnable threads are doing a same activity with the following trace:

at java.net.SocketInputStream.socketRead0(Native Method) 
at java.net.SocketInputStream.read(SocketInputStream.java:129) 
at org.apache.coyote.ajp.AjpProcessor.read(AjpProcessor.java:1012) 
at org.apache.coyote.ajp.AjpProcessor.readMessage(AjpProcessor.java:1091) 
at org.apache.coyote.ajp.AjpProcessor.process(AjpProcessor.java:384) 
at org.apache.coyote.ajp.AjpProtocol$AjpConnectionHandler.process(AjpProtocol.java:366) 
at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:446) 
at java.lang.Thread.run(Thread.java:662)

I am not able to interpret this with the stack trace. Also I find that even when the users have logged out, the CPU utilization still continues to be the same with threads in the same state.

Arjan Tijms
  • 37,782
  • 12
  • 108
  • 140
Dwarakanath
  • 31
  • 1
  • 2

2 Answers2

1

These threads are attempting to read from a Socket connection. In this case they are waiting for the next request to be sent to the server from mod_jk in Apache. This is quite normal and they probably are not the reason for your CPU usage.

At this point you really need to go and run your application through a profiler.

If you are unable to run a profiler on the system (i.e. it's a production box) the next best thing is to start to take many stack dumps each a couple of seconds apart and then go though them by hand matching up the thread IDs. You need to look for the threads that are running your code and don't seem to have changed between dumps.

It is a very tedious task and doesn't always get clear results, but without a profiler or some sort of instrumentation you won't be able to find where all that CPU is going.

Gareth Davis
  • 27,701
  • 12
  • 73
  • 106
  • Is it possible that my request is so heavy that it kept reading them always.Also my apache and jboss instance are in same physical box. – Dwarakanath Jun 13 '11 at 13:10
  • If you keep taking stack dumps you would see more activity than just `socketRead0` jboss would be at least at least something else on. The brutal truth is that the threads you need to look at first at the ones with your application in, jboss and apache and jboss don't really add a huge CPU overhead in the grant scheme of things. – Gareth Davis Jun 13 '11 at 13:21
  • 1
    Normally I would expect waiting threads to be in status WAITING and then in case of Tomcat (or JBoss) waiting on `org.apache.tomcat.util.net.JIoEndpoint$Worker`. I've seen cases where one of the AJP deamon threads was also 'stuck' in socketRead0 with status RUNNABLE, where all others were in the mentioned WAITING state. – Arjan Tijms Jun 13 '11 at 13:24
  • Thank you all again.I see that all my socket.read0() are always runnable.Also my lbmethod is S in worker.properties.Also we are using JoSSo for authentication.When we are running a load test we found that even when the logout has happened then the utilization is not coming down.Any thoughts on this. – Dwarakanath Jun 13 '11 at 14:01
  • You could temporally switch to use mod_proxy/mod_proxy_http to replace mod_jk and connect to the http connector of jboss, if the CPU usage is much less then there really is an issue with the mod_jk/AJP setup – Gareth Davis Jun 13 '11 at 14:08
  • Thank you.But is there a way to kill these ajp threads,although they dont get killed with connection timeout.i have also installed APR and tried of no use. – Dwarakanath Jun 13 '11 at 15:34
  • I doubt there is a direct way other than restarting jboss. Your workers.properties file will have some sort of min connection setting you might want to set that low so that mod_jk closes the connections it is no longer using. – Gareth Davis Jun 13 '11 at 15:49
  • I did one more test by keeping the load balancer method as R.Now found that when the users logout the utilization is coming down.But I find that when the user session timesout,we the AJP threads are not getting killed. – Dwarakanath Jun 14 '11 at 05:28
0

Review your AJP configuration between Apache and Jboss, as described in https://developer.jboss.org/wiki/OptimalModjk12Configuration

The problem

JBoss Web's (Tomcat) server.xml AJP snippet:

<Connector port="8009" address="${jboss.bind.address}" protocol="AJP/1.3"
         emptySessionPath="true" enableLookups="false" redirectPort="8443" ></Connector>   Apache's httpd.conf:

<IfModule prefork.c>
StartServers       8
MinSpareServers    5
MaxSpareServers   20
ServerLimit      256
MaxClients       256
MaxRequestsPerChild  4000
</IfModule>

The above configuration, under load, may cause mod_jk to be very slow and unresponsive, cause http errors, and cause half-closed connections. These problems can arise because there are no connection timeouts specified to take care of orphaned connections, no error handling properties defined in workers.properties, and no connection limits set in Apache and Tomcat.

But this high number of threads could be from another source. As described here:

the most common scenario for a hanging Socket.read() is a high processing time or unhealthy state of your remote service provider. This means that you will need to communicate with the service provider support team right away in order to confirm if they are facing some slowdown condition on their system.

Your applicaton server Threads should be released once the remote service provider system problem is resolved but quite often you will need to restart your server instances (Java VM) to clear all the hanging Threads; especially if you are lacking proper timeout implementation.

Other less common causes include:

  • Huge response data causing increased elapsed time to read / consume the Socket Inputstream e.g. such as very large XML data. This can be proven easily by analysing the size of the response data
  • Network latency causing increased elapsed time in data transfer from the service provider to your Java EE production system. This can be proven by running some network sniffer between your production server and the service provider and determine any major lag/latency problem

Whatever was your problem, the first thing to do is review your timeout configuration!

What you can do?

You need to do some configuration for Jboss and Apache.

JBoss side

The main concern with server.xml is setting the connectionTimeout which sets the SO_TIMEOUT of the underlying socket. So when a connection in Tomcat hasn't had a request in the amount of time specified by connectionTimeout, then the connection dies off. This is necessary because if the connection hasn't been used for a certain period of time then there is the chance that it is half-close on the mod_jk end.

If the connection isn't closed there will be an inflation of threads which can over time hit the maxThreads count in Tomcat then Tomcat will not be able to accept any new connections. A connectionTimeout of 600000 (10 minutes) is a good number to start out with. There may be a situation where the connections are not being recycled fast enough, in this instance the connectionTimeout could be lowered to 60000 or 1 minute.

When setting connectionTimeout in Tomcat, mod_jk should also have connect_timeout/prepost_timeout set, which allows detection that the Tomcat connection has been closed and preventing a retry request.

The recommended value of maxThreads is 200 per CPU, so here we assume the server is a single core machine. If it has been quad core, we could push that value to 800, and more depending on RAM and other machine specs.

<Connector port="8009"
           address="${jboss.bind.address}"
           emptySessionPath="true"
           enableLookups="false"
           redirectPort="8443"
           protocol="AJP/1.3"
           maxThreads="200"
           connectionTimeout="600000"></Connector>

Apache side

worker.properties file

See comments inline.

worker.list=loadbalancer,status

worker.template.port=8009
worker.template.type=ajp13
worker.template.lbfactor=1
#ping_timeout was introduced in 1.2.27
worker.template.ping_timeout=1000
#ping_mode was introduced in 1.2.27, if not 
#using 1.2.27 please specify connect_timeout=10000 
#and prepost_timeout=10000 as an alternative
worker.template.ping_mode=A
worker.template.socket_timeout=10
#It is not necessary to specify connection_pool_timeout if you are running the worker mpm 
worker.template.connection_pool_timeout=600

#Referencing the template worker properties makes the workers.properties shorter and more concise
worker.node1.reference=worker.template
worker.node1.host=192.168.1.2

worker.node2.reference=worker.template
worker.node2.host=192.168.1.3

worker.loadbalancer.type=lb
worker.loadbalancer.balance_workers=node1,node2
worker.loadbalancer.sticky_session=True

worker.status.type=status

The key points in the above workers.properties is we've added limits for the connections mod_jk makes. With the base configuration, socket timeouts default to infinite. The other important properties are ping_mode and ping_timeout which handle probing a connection for errors and connection_pool_timeout which must be set to equal server.xml's connectionTimeout when using the prefork mpm. When these two values are the same, after a connection has been inactive for x amount of time, the connection in mod_jk and Tomcat will be closed at the same time, preventing a half-closed connection.

Apache configuration

Make note that maxThreads for the AJP connection should coincide with the MaxClients set in Apache's httpd.conf. MaxClients needs to be set in the correct module in Apache.

This can be determined by running httpd -V:

# httpd -V

Server version: Apache/2.2.3
Server built:   Sep 11 2006 09:43:05
Server's Module Magic Number: 20051115:3
Server loaded:  APR 1.2.7, APR-Util 1.2.8
Compiled using: APR 1.2.7, APR-Util 1.2.7
Architecture:   32-bit
Server MPM:     Prefork
  threaded:     no
    forked:     yes (variable process count)
Server compiled with....
-D APACHE_MPM_DIR="server/mpm/prefork"
-D APR_HAS_SENDFILE
-D APR_HAS_MMAP
-D APR_HAVE_IPV6 (IPv4-mapped addresses enabled)
-D APR_USE_SYSVSEM_SERIALIZE
-D APR_USE_PTHREAD_SERIALIZE
-D SINGLE_LISTEN_UNSERIALIZED_ACCEPT
-D APR_HAS_OTHER_CHILD
-D AP_HAVE_RELIABLE_PIPED_LOGS
-D DYNAMIC_MODULE_LIMIT=128
-D HTTPD_ROOT="/etc/httpd"
-D SUEXEC_BIN="/usr/sbin/suexec"
-D DEFAULT_PIDLOG="logs/httpd.pid"
-D DEFAULT_SCOREBOARD="logs/apache_runtime_status"
-D DEFAULT_LOCKFILE="logs/accept.lock"
-D DEFAULT_ERRORLOG="logs/error_log"
-D AP_TYPES_CONFIG_FILE="conf/mime.types"
-D SERVER_CONFIG_FILE="conf/httpd.conf"

Which tells me the Server MPM is Prefork. This is not always 100% accurate so you should also view the output of /etc/sysconfig/httpd to see if the following line is there: HTTPD=/usr/sbin/httpd.worker. If it is commented out you are running prefork, otherwise if uncommented worker.

httpd.conf:

<IfModule prefork.c>
StartServers       8
MinSpareServers    5
MaxSpareServers   20
MaxClients       200
MaxRequestsPerChild  0
</IfModule>

Or if Apache is using worker, it is

<IfModule worker.c>
StartServers         2
MaxClients         200
MinSpareThreads     25
MaxSpareThreads     75
ThreadsPerChild     25
MaxRequestsPerChild  0
</IfModule>

MaxRequestsPerChild is 0, this is the recommended value when using mod_jk as mod_jk keeps open persistent connections. The key values in the above configuration are MaxClients and MaxRequestsPerChild, the rest of the values are left as default. Note that MaxRequestsPerChild is recommended to be 0 however the value may need to be greater than 0 depending on if Apache is used for other modules also, especially in the case of resource leakage.

In the link you can find another configuration to optimize even more this scenario.

Dherik
  • 17,757
  • 11
  • 115
  • 164