8

I've been using the Java Service wrapper in a custom application for quite a while and it's been working fine. Since updating our application to a new version in the last few days the JVM started hanging and then wrapper prints this in the log: JVM appears hung: Timed out waiting for signal from JVM.

It then automatically terminates the JVM and starts the app again. This happens after about 10 hours of running, which just makes it harder to debug.

Of course I am going to look through the changes that we've made, but no major changes were made that I would suspect is causing this type of problem.

Where can I look to try and figure out what is happening? Debug messages from the application don't indicate anything interesting. If the JVM just crashes it will usually create a dump, which can help in debugging it, but it's hanging, so it's not creating a dump. If I make it not restart the service automatically is there anything I can do to get some useful information out of the JVM before restarting it?

It seems to me that the JVM shouldn't hang from typical programming errors. What have you run into before that can cause the JVM to hang?

Vadzim
  • 24,954
  • 11
  • 143
  • 151
Sarel Botha
  • 12,419
  • 7
  • 54
  • 59

4 Answers4

9

Read up on the wrapper.ping.timeout property. The wrapper software communicates with your JVM every so often to make sure that it is alive. If that communication fails for whatever reason, the wrapper deems the process hung and attempts to restart it.

Depending on how your application is architected, your JVM might be busy processing something else when the wrapper tries to "ping" it.

Kevin
  • 30,111
  • 9
  • 76
  • 83
  • Increasing the property can cause wrapper to not notice the problem and not restart the app, but that's just a work-around for the issue. During the time that it's hanging it would not respond to client requests, which is not good either. – Sarel Botha Mar 01 '09 at 23:20
2

See if you can use the Visual VM to see what is going on. Have the Visual VM monitor the app for the whole time and when it stops working perhaps you can determine what is wrong.

If the VM hangs you can get the state of the threads... I think the Visual VM will make it a bit easier given your setup than the usual ctrl-break (or whatver the key combo is).

(Edit based on comment)

Tried this. Last time it hung the number of threads and the amount of memory in use were quite low, so neither of those are causing the problem. Unfortunately after it hangs and wrapper terminates it you can't get a thread dump.

Is there any way you can run it without the wrapper to debug it? Also if you use the NetBeans profiler it might give you a chance to deal with it when it stops (I'll check later today and see if I can find out if that would behave differently).

TofuBeer
  • 60,850
  • 18
  • 118
  • 163
  • Tried this. Last time it hung the number of threads and the amount of memory in use were quite low, so neither of those are causing the problem. Unfortunately after it hangs and wrapper terminates it you can't get a thread dump. – Sarel Botha Mar 04 '09 at 13:57
2

I had a couple different versions of a library on the classpath (JBPM). With wrapper you can use wildcards to include jars. Be careful with this though as you may accidentally include more than you should.

Here is an IBM article that gives information on debugging hangs in Java. It basically says that there are two things that can cause hangs:

  1. An infinite loop,
  2. A deadlock.

Since then I've had to debug other hanging issues. On linux you can send the JVM the QUIT signal to make it do a thread dump to the console. This really helps figuring out where the issue is. Use this command to do that: kill -QUIT

Edit 6/13/2017

These days I use jmap included in the JDK to dump the entire memory of the program. Then I use Eclipse Memory Analyzer to see the exact state of the program when it crashed. You can look at the list of threads that are active and then inspect the variables in each stack frame.

/usr/java/latest/bin/jmap -dump:file=/tmp/app-crash.hprof <PID>

Where PID is the process ID of the java process.

Sarel Botha
  • 12,419
  • 7
  • 54
  • 59
  • (the link is broken. After searching around in web archive I gather that the article name [should](https://web.archive.org/web/20140530051250/http://publib.boulder.ibm.com/infocenter/javasdk/v1r4m2/index.jsp?topic=%2Fcom.ibm.java.doc.diagnostics.142j9%2Fhtml%2Fjvmtrdrcontrol.html&path=1_5_10_3_1) be "Java Diagnostics Guide 1.4.2 → Problem determination → Windows problem determination → Debugging hangs", [...] – user202729 Feb 24 '23 at 04:38
  • [...] while version 1.4.2 appears to be [impossible](https://www.ibm.com/support/pages/java-diagnostics-guides-information-centers) [to](https://www.ibm.com/docs/en/sdk-java-technology?topic=SSYKE2/earlier_releases/earlier_releases.html) download, version 6.0 is [available](https://web.archive.org/web/20230224042430/https://www.ibm.com/docs/en/SSYKE2/earlier_releases/6/pdf/en/diag60.pdf) and has a corresponding section. – user202729 Feb 24 '23 at 04:39
1

What environment are you in? OS, JVM version, hardware architecture?

That does sound like a bug, and given that it takes many hours, it sounds like a resource-exhaustion bug of some sort.

Charlie Martin
  • 110,348
  • 25
  • 193
  • 263
  • Linux RHEL4, Java 1.6.0, Intel 32 bit. I'm monitoring the number of threads and memory usage; so far it's not using much of either. We don't use threads much. We just fire up a few at application start time to look every few minutes to see if there's something to process. – Sarel Botha Mar 01 '09 at 22:16
  • how consistent is the 10 hours? There are really only a couple of possibilities: either *some* resource exhaustion, or a random deadlock/delay. What kind of application? – Charlie Martin Mar 01 '09 at 22:32
  • Not consistent at all. So far it's only happened three times. Actually it looks like the average is a bit higher than 10 hours. It happened at 12, 14 and 19 hours. What I'm going to try and do is set it to not auto restart so I can investigate the state a bit when it happens. – Sarel Botha Mar 01 '09 at 23:12
  • Okay, then that argues for a random event. I'd be tempted instead to set the ping interval to be as short as possible so you can repeat it more quickly. – Charlie Martin Mar 01 '09 at 23:16
  • It has also been occurring at random times. We have some jobs running at fixed times so it's not them. This is a stand-alone Java server application. It accepts RMI calls from clients. It communicates with a database via Hibernate. – Sarel Botha Mar 01 '09 at 23:19
  • What database? I've had trouble with Postgres taking itself off on a side jaunt cleaning itself up, and thus periodically introducing a wait. – Charlie Martin Mar 01 '09 at 23:41
  • This is mysql. That shouldn't cause a problem in the Java process though. One thread waiting for a database response shouldn't hang the whole JVM. – Sarel Botha Mar 02 '09 at 00:20
  • A recent version of RHEL4? A few years ago Red hat had a bug where the OS dropped notifies. – Tom Hawtin - tackline Mar 02 '09 at 13:02