0

Did anyone come across the problem of hanging threads in tomcat7? During working hours we receive millions of HTTP requests on production server. Tomcat behaves perfectly fine for all the requests and responds within the specified time limits. Only once or twice a week some thread pauses for more than 10 seconds which causes delays in response.

The environment details are

  • Java 7
  • Tomcat 7
  • Max Memory allocated 8GB
  • Spring MVC implementation
  • CentOS 6.5.

Not sure whether the problem is related to Tomcat or Java or issue with the virtual machine. Please note that GC logging is enabled which shows pauses every minute for less than a second. I did not see any pause in the gc log with 10 second.

Application Logs

2015-12-01 14:28:41,696|lvl=INFO|cls=identity.service.IdentityService|thd=http-apr-8080-exec-6|mod=URS|opn=[modifyPostionAssignment]|ssoTicket=ESR - N/A|requestID=C6232437-0116-0401-1201-142827450001|deviceId=0.481455058954918820af268b0.6793986896475551|sessionId=JSESSIONID0.5594741341655929|tid=HG-INT-01-0GRZI7BhTreRniamQm0esA|msg=[maintainUserMasterData] START
2015-12-01 14:28:28,226|lvl=INFO|cls=identity.service.controller.IdentityServiceController|thd=http-apr-8080-exec-6|mod=URS|opn=[modifyPostionAssignment]|ssoTicket=ESR - N/A|requestID=C6232437-0116-0401-1201-142827450001|deviceId=0.481455058954918820af268b0.6793986896475551|sessionId=JSESSIONID0.5594741341655929|tid=HG-INT-01-0GRZI7BhTreRniamQm0esA|msg=[maintainUserMasterData] START

GC Logs

2015-12-01T14:28:41.461+0000: 500601.110: Total time for which application threads were stopped: 0.2703630 seconds
2015-12-01T14:28:40.705+0000: 500600.354: Total time for which application threads were stopped: 0.2837290 seconds
2015-12-01T14:28:39.473+0000: 500599.122: Total time for which application threads were stopped: 0.0011780 seconds
2015-12-01T14:28:39.123+0000: 500598.771: Total time for which application threads were stopped: 0.0866600 seconds
2015-12-01T14:28:35.918+0000: 500595.566: Total time for which application threads were stopped: 0.0001030 seconds
2015-12-01T14:28:35.918+0000: 500595.566: Total time for which application threads were stopped: 0.0001520 seconds
2015-12-01T14:28:35.917+0000: 500595.566: Total time for which application threads were stopped: 0.0005330 seconds
2015-12-01T14:28:35.914+0000: 500595.562: Total time for which application threads were stopped: 0.0001030 seconds
2015-12-01T14:28:35.914+0000: 500595.562: Total time for which application threads were stopped: 0.0002980 seconds
2015-12-01T14:28:35.913+0000: 500595.562: Total time for which application threads were stopped: 0.0012430 seconds
2015-12-01T14:28:34.468+0000: 500594.117: Total time for which application threads were stopped: 0.0001340 seconds
2015-12-01T14:28:34.468+0000: 500594.117: Total time for which application threads were stopped: 0.0001210 seconds
2015-12-01T14:28:34.468+0000: 500594.116: Total time for which application threads were stopped: 0.0001500 seconds
2015-12-01T14:28:34.467+0000: 500594.116: Total time for which application threads were stopped: 0.0012510 seconds
2015-12-01T14:28:29.479+0000: 500589.127: Total time for which application threads were stopped: 0.0012900 seconds
2015-12-01T14:28:27.979+0000: 500587.628: Total time for which application threads were stopped: 0.0607270 seconds
2015-12-01T14:28:27.101+0000: 500586.750: Total time for which application threads were stopped: 0.0982840 seconds

Tomcat access Logs:

[01/Dec/2015:14:28:41 +0000] remotehost=192.168.62.2 x-forwarded-for=192.168.61.11 method=POST uri=/ursbusservice/maintainUserMasterData/ qs= status=200 bytes=41365 respTime=4766 method = POST status = 200
[01/Dec/2015:14:28:41 +0000] remotehost=192.168.62.2 x-forwarded-for=192.168.61.26 method=POST uri=/ursbusservice/maintainUserMasterData/ qs= status=200 bytes=4041 respTime=13580 method = POST status = 200
[01/Dec/2015:14:28:40 +0000] remotehost=192.168.62.2 x-forwarded-for=192.168.61.13 method=POST uri=/ursbusservice/maintainUserMasterData/ qs= status=200 bytes=14204 respTime=1159 method = POST status = 200
[01/Dec/2015:14:28:39 +0000] remotehost=192.168.62.2 x-forwarded-for=192.168.61.12 method=POST uri=/ursbusservice/maintainUserMasterData/ qs= status=200 bytes=59554 respTime=13990 method = POST status = 200
[01/Dec/2015:14:28:35 +0000] remotehost=192.168.62.2 x-forwarded-for=192.168.61.10 method=POST uri=/ursbusservice/maintainUserMasterData/ qs= status=200 bytes=37664 respTime=186 method = POST status = 200
[01/Dec/2015:14:28:33 +0000] remotehost=192.168.62.2 x-forwarded-for=192.168.61.12 method=POST uri=/ursbusservice/maintainUserMasterData/ qs= status=200 bytes=42983 respTime=426 method = POST status = 200

Code snippet, IdentityServiceController.java:

@RequestMapping(value = RequestMappings.MAINTAINUSER_MASTER_DATA, method = RequestMethod.POST, consumes = IdentityServiceConstants.APPLICATION_JSON)
@ResponseBody
public UserMasterVO maintainUserMasterData(@RequestBody UserMasterVO userMasterVO)
{
    ContextInfo contextInfo = userMasterVO.getContextInfo();
    ServiceContext.set(contextInfo);
    dLogger.info(IdentityServiceConstants.START, contextInfo, new Object[] { IdentityServiceConstants.Methods.MAINTAIN_USERMASTER_DATA});
    if (userMasterVO.getUuid() != null && userMasterVO.getOperation() != null)
    {
        identityService.maintainUserMasterData(userMasterVO);
    }
    else
    {
        throw new IdentityServiceException();
    }
    return userMasterVO;
}

IdentityService.java:

public void maintainUserMasterData(UserMasterVO userMasterVO) {
    ContextInfo contextInfo = userMasterVO.getContextInfo();
    ServiceContext.set(contextInfo);
    dLogger.info(IdentityServiceConstants.START, contextInfo, new Object[] {IdentityServiceConstants.Methods.MAINTAIN_USERMASTER_DATA});
    //do some operations
    dLogger.info(IdentityServiceConstants.END, contextInfo, new Object[] {IdentityServiceConstants.Methods.MAINTAIN_USERMASTER_DATA});
}
eis
  • 51,991
  • 13
  • 150
  • 199
kk.
  • 3,747
  • 12
  • 36
  • 67
  • Try dumping all threads at the time when that happens and find out what the slow thread is waiting for – Jan Dec 22 '15 at 10:59
  • @Jan could you please provide me the details about dumping a thread? Please note that the thread does not pause forever. It responds after 10-15 seconds. – kk. Dec 22 '15 at 11:02
  • 1
    It's a bit worrying that you have production server with millions of requests but don't know how to take a thread dump. See [this thread](http://stackoverflow.com/questions/18573411/tomcat-thread-dump) for reference. – eis Dec 22 '15 at 11:05
  • Try to log the start and end time for each response along with additional data. This may help to sort out. – Shafi Dec 22 '15 at 11:05
  • Also, you should have gc logs available. I think your server might very well be doing garbage collection. – eis Dec 22 '15 at 11:10
  • @eis I have gc logs enabled which shows pauses throughout the day but the pauses are less than a second. I did not see any pause in the gc log with 10 second. Also, application responds to other threads during this period hence it can be said that its not a pause for entire JVM for 10 seconds duration. – kk. Dec 22 '15 at 11:16
  • Ok. That's valuable information you should add to the question. Your operating system details are something to add as well, to help with debugging instructions. – eis Dec 22 '15 at 11:25

1 Answers1

3

If it's as rare as you say then it could be almost anything.

Some places to look:

  • Is a GC happening?
  • Is there a pattern?
  • If there is a timing pattern check out what happens at that time.
  • How often does your anti-virus do a full scan?
  • If there's a database connection, how is it performing at that time?
  • Is there anyone logged into the system at the time?
  • Are there any system errors logged at the time?
  • How big is your log file? How many log files are there in the log folder?
OldCurmudgeon
  • 64,482
  • 16
  • 119
  • 213
  • GC is happening almost every second. The thread is getting into one method and printing the log the first statement. When the second method is called, the time taken to log the first statement is more than 10 seconds. It does not seem to be a problem with database. It is not pausing while doing database operation. No errors reported during this time. – kk. Dec 22 '15 at 11:26
  • @KrishnaKuntala - *the time taken to log the first statement is more than 10 seconds* - Maybe it is the logging that is the problem. Adding some more suggestions. – OldCurmudgeon Dec 22 '15 at 11:38
  • @KrishnaKuntala you should add the content of those methods also the the question. – eis Dec 22 '15 at 11:40
  • Thanks @eis. Updated my question. – kk. Dec 22 '15 at 12:17
  • It does not seem to be a problem with just logging. We have spikes in response times as well during this thread response as you can see in Tomcat access logs. – kk. Dec 22 '15 at 15:21