31

My application is using Gson 2.2 for converting POJOs to JSON. When I was making a load test I stumbled upon a lot of threads blocked in Gson constructor:

"http-apr-28201-exec-28" #370 daemon prio=5 os_prio=0 tid=0x0000000001ee7800 nid=0x62cb waiting for monitor entry [0x00007fe64df9a000]
    java.lang.Thread.State: BLOCKED (on object monitor)
    at com.google.gson.Gson.<init>(Gson.java:200)
    at com.google.gson.Gson.<init>(Gson.java:179)

Thread dump does NOT show any threads holding [0x00007fe64df9a000] monitor. How can I find out who holds it?

Gson code at line 200 looks pretty innocent:

// built-in type adapters that cannot be overridden
factories.add(TypeAdapters.STRING_FACTORY);
factories.add(TypeAdapters.INTEGER_FACTORY);

I'm using JRE 1.8.0_91 on Linux

SkyWalker
  • 28,384
  • 14
  • 74
  • 132
bedrin
  • 4,458
  • 32
  • 53
  • 1
    have you tried [eclipse mat](http://www.eclipse.org/mat/)? – fhofmann May 18 '16 at 19:09
  • @fhofmann I don't have any issues with heap - how can it help me? – bedrin May 18 '16 at 19:11
  • You are right. I had a similar problem some time ago and used eclipse Mat, a profiler (yourkit) and a specific Parser to discover that the problem was a HashTable(Properties) used by the API configuration. MAT helped me to find relevant classes. – fhofmann May 18 '16 at 19:28
  • Can you post the full thread dump? – Matt McHenry May 21 '16 at 02:28
  • Could it be from the fact that : "Gson instances are Thread-safe so you can reuse them freely across multiple threads". Are you testing via servlets or POJOs ? – Mitja Gustin May 21 '16 at 20:04
  • @MitjaGustin via Servlets – bedrin May 22 '16 at 08:55
  • @MattMcHenry I'm not sure I can - I'll have to remove my own classes from it anyway due to my employer policices – bedrin May 22 '16 at 08:58
  • Is there some threads holding any monitors? – Nikem May 22 '16 at 14:58
  • @Nikem there're no threads holding 0x00007fe64df9a000 according to the thread dump; but of course there some threads holding other monitors – bedrin May 22 '16 at 16:59
  • Most likely either `factories.add` is synchronized or otherwise holds a monitor, or it calls other code that does. That might give you a place to start looking. I'd need to know what type `factories` is and maybe have code for it to provide more detailed suggestions. – Warren Dew May 22 '16 at 18:32
  • I found [this](https://dzone.com/articles/how-analyze-java-thread-dumps) link with a nice explanation of the topic "thread dump analysis". Look at section "When Unable to Obtain a Lock (BLOCKED)". There should be a line like "- waiting to lock <0x0000000780a000b0> " in your thread dump of the blocked thread. – Steffen Harbich May 23 '16 at 09:06
  • Do you have a warmup phase in your load test? If so how many threads do you use during this warmup phase and how many during the normal phase? Do you get this issue all the time or at a given moment of your load test? – Nicolas Filotto May 24 '16 at 17:57

2 Answers2

16

tl;dr I think you are running into GC-related behavior, where threads are being put in waiting state to allow for garbage collection.


I do not have the whole truth but I hope to provide some pieces of insight.

First thing to realize is that the number in brackets, [0x00007fe64df9a000], is not the address of a monitor. The number in brackets can be seen for all threads in a dump, even threads that are in running state. The number also does not change. Example from my test dump:

main" #1 prio=5 os_prio=0 tid=0x00007fe27c009000 nid=0x27e5c runnable [0x00007fe283bc2000]
   java.lang.Thread.State: RUNNABLE
        at Foo.main(Foo.java:12)

I am not sure what the number means, but this page hints that it is:

... the pointer to the Java VM internal thread structure. It is generally of no interest unless you are debugging a live Java VM or core file.

Although the format of the trace explained there is a bit different so I am not sure I am correct.

The way a dump looks when the address of the actual monitor is shown:

"qtp48612937-70" #70 prio=5 os_prio=0 tid=0x00007fbb845b4800 nid=0x133c waiting for monitor entry [0x00007fbad69e8000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:233)
        - waiting to lock <0x00000005b8d68e90> (a java.lang.Object)

Notice the waiting to lock line in the trace and that the address of the monitor is different from the number in brackets.

The fact that we cannot see the address of the monitor involved indicates that the monitor exists only in native code.

Secondly, the Gson code involved does not contain any synchronization at all. The code just adds an element to an ArrayList (assuming no bytecode manipulation has been done and nothing fishy is being done at low level). I.e., it would not make sense to see the thread waiting for a standard synchronization monitor at this call.

I found some, indications that threads can be shown as waiting for a monitor entry when there is a lot of GC going on.

I wrote a simple test program to try to reproduce it by just adding a lot of elements to an array list:

List<String> l = new ArrayList<>();
while (true) {
    for (int i = 0; i < 100_100; i++) {
            l.add("" + i);
    }
    l = new ArrayList<>();
}

Then I took thread dumps of this program. Occasionally I ran into the following trace:

"main" #1 prio=5 os_prio=0 tid=0x00007f35a8009000 nid=0x12448 waiting on condition [0x00007f35ac335000]
   java.lang.Thread.State: RUNNABLE
      at Foo.main(Foo.java:10)   <--- Line of l.add()

While not identical to the OP's trace, it is interesting to have a thread waiting on condition when no synchronization is involved. I experienced it more frequently with a smaller heap, indicating that it might be GC related.

Another possibility could be that code that contains synchronization has been JIT compiled and that prevents you from seeing the actual address of the monitor. However, I think that is less likely since you experience it on ArrayList.add. If that is the case, I know of no way to find out the actual holder of the monitor.

samabcde
  • 6,988
  • 2
  • 25
  • 41
K Erlandsson
  • 13,408
  • 6
  • 51
  • 67
  • Interresting! I agree - it looks like this behavior is caused by GC. Especially since I'm doing a load test. I'll try to get some more telemetry from VM to prove it – bedrin May 23 '16 at 08:36
  • You were right! I have started the load test against a server with attached profiler and saw a lot of major GC events caused by a massive amount of requests (that's why we need load tests!). I think that if I had increased the load I would end up with "GC overhead limit exceeded" exceptions but apparently my load was not big enough – bedrin May 24 '16 at 20:27
-1

If you don't have GC issues then may be actually there is some thread which has acquired lock on an object and stuck thread is waiting to acquire lock on the same object. The way to figure out is look for

- waiting to lock <some_hex_address> (a <java_class>)

example would be

- waiting to lock <0x00000000f139bb98> (a java.util.concurrent.ConcurrentHashMap)

in the thread dump for entry which says waiting for monitor entry. Once you have found it, you can search for thread that has already acquired lock on the object with address <some_hex_address>, it would look something like this for the example -

- locked <0x00000000f139bb98> (a java.util.concurrent.ConcurrentHashMap)

Now you can see the stacktrace of that thread to figure out which line of code has acquired it.

Parag
  • 165
  • 1
  • 3
  • 11