11

I have implemented a simple profiler with JVMTI to display the invocation on wait() and notifyAll(). As a test case I am using the. producer consumer example of Oracle. I have the following three events:

  • notifyAll() is invoked
  • wait() is invoked
  • wait() is left

The wait() invocation and when its left it profiled by using the events MonitorEnter and MonitorExit. The notifyAll() invocation is profiled when a method with name notifyAll is exited.

Now I have the following results, the first is from the profiler itself and the second is from Java, where I have placed the appropriate System.out.println statement.

    // Profiler:
    Thread-1 invoked notifyAll()
    Thread-0 invoked notifyAll()
    Thread-0 invoked notifyAll()
    Thread-0 invoked notifyAll()
    Thread-0 invoked notifyAll()
    Thread-0 invoked notifyAll()
    Thread-1 invoked notifyAll()
    Thread-1 invoked notifyAll()
    Thread-1 invoked notifyAll()
    Thread-1 invoked notifyAll()
    Thread-1 invoked notifyAll()
    Thread-1 invoked notifyAll()
    Thread-1 invoked notifyAll()
    Thread-1 invoked wait()
    Thread-1 left wait()
    Thread-1 invoked notifyAll()
    Thread-1 invoked wait()
    Thread-1 left wait()
    Thread-1 invoked notifyAll()
    Thread-1 invoked wait()
    Thread-1 left wait()
    Thread-1 invoked notifyAll()

    // Java:
    Thread-0 invoked notifyAll()
    Thread-1 invoked notifyAll()
    Thread-0 invoked notifyAll()
    Thread-1 invoked notifyAll()
    Thread-0 invoked notifyAll()
    Thread-1 invoked wait()
    Thread-1 invoked notifyAll()
    Thread-0 invoked notifyAll()
    Thread-1 invoked wait()
    Thread-1 invoked notifyAll()
    Thread-0 invoked notifyAll()
    Thread-1 invoked wait()
    Thread-1 invoked notifyAll()

Has someone an explanation from what this discrepancy comes from? notifyAll() is called so many times. I was told this might be due to false-positive responses from the request of Java to the operating system.

A notifyAll() request it sent to the operating system and a false-positive response is sent, where it seems like the request was sucessfull. Since notifyAll is logged by profiling method invocation instead of MonitorEnter it could explain why this does not happen with wait.

I forgot to say, I didn't run the programs separately, both logs are from the same execution.

Aditional Information

Originally added as an answer, moved to the question by extraneon:

I think I found out where some of the additional notifyAll are coming from, I added the profiling of the method context in which notifyAll is called:

723519: Thread-1 invoked notifyAll() in Consumer.take
3763279: Thread-0 invoked notifyAll() in Producer.put
4799016: Thread-0 invoked notifyAll() in Producer.put
6744322: Thread-0 invoked notifyAll() in Producer.put
8450221: Thread-0 invoked notifyAll() in Producer.put
10108959: Thread-0 invoked notifyAll() in Producer.put
39278140: Thread-1 invoked notifyAll() in java.util.ResourceBundle.endLoading
40725024: Thread-1 invoked notifyAll() in java.util.ResourceBundle.endLoading
42003869: Thread-1 invoked notifyAll() in java.util.ResourceBundle.endLoading
58448450: Thread-1 invoked notifyAll() in java.util.ResourceBundle.endLoading
60236308: Thread-1 invoked notifyAll() in java.util.ResourceBundle.endLoading
61601587: Thread-1 invoked notifyAll() in java.util.ResourceBundle.endLoading
70489811: Thread-1 invoked notifyAll() in Consumer.take
75068409: Thread-1 invoked wait() in Drop.take
75726202: Thread-1 left wait() in Drop.take
77035733: Thread-1 invoked notifyAll() in Consumer.take
81264978: Thread-1 invoked notifyAll() in Consumer.take
85810491: Thread-1 invoked wait() in Drop.take
86477385: Thread-1 left wait() in Drop.take
87775126: Thread-1 invoked notifyAll() in Consumer.take

But even without these external calls, there are plteny of notifyAll calls which do not show up in the printf debugging.

extraneon
  • 23,575
  • 2
  • 47
  • 51
Konrad Reiche
  • 27,743
  • 15
  • 106
  • 143
  • Using system.out/err to trace is a horrid idea when you deal w/ concurrent code. Logging/System.out/err just impose explicit synchronization + extra cache coherency. Show the real code that causes the issue and I will see what it's possible to figure out. – bestsss May 07 '11 at 17:50
  • @bestsss I am not talking about the sequence of events displayed, only the total number of calls made to notifyAll. Does your argument still holds then? The real code is linked above, it's an example of Oracle The Java Tutorials. – Konrad Reiche May 08 '11 at 07:20
  • it is only normal to have many more notifies for the simple reason, not each 'notify' will wake find the other thread in 'wait' state. Adding the trace possibly worsen the issue though. I find the behavior totally normal. I do use CAS (compare and set/swap) to avoid entering sync blocks and calling notify extensively. – bestsss May 08 '11 at 12:44
  • @bestsss: Do you mean, that one call to notifyAll() might look like one (printf-debugging), but is actually even more often (profiling)? Someone told me, it might be natural the system calls to implement notifyAll() fail, so it has to be called again. This would match up with your argument 'not each thread is found to wake up'. I was told to look into condition variables in the pthread manpages. I've looked into it, but I couldn't find any error states when pthread_cond_broadcast is called. Even though it seems natural, I would like to have a documentation somewhere why this happens. – Konrad Reiche May 08 '11 at 22:16

2 Answers2

4

I spent some time analyzing the Producer-Consumer example provided by Oracle and your output (profiler and Java program). There are some strange things on your outputs besides the several unexpected notifyAll():

  1. we should expect the wait() method to execute 4 times (the String array manipulated by the producer has 4 elements). The result of your profiler shows that it only executed three times.

  2. Another thing that is quite strange is the numbering of the threads in the output of the profiler. The example has two threads, however your profiler executes all the code in one thread, i.e. Thread-1, while Thread-0 just executes notifyAll().

  3. The example code provided is correctly programmed from a concurrent perspective and language perspective: wait() and notifyAll() are in synchronized methods to ensure control over the monitor; wait condition is within a while loop with the notifies correctly placed in the end of the methods. However, I noticed that the catch (InterruptedException e) block is empty, which means that if the thread that is waiting is interrupted, the notifyAll() method will be executed. This can be a cause for the several unexpected notifyAll().

In conclusion, without performing some modifications in the code and perform some extra tests, it won't be easy to figure out where the problem comes from.

As a side note, I'll leave this link Creating a Debugging and Profiling Agent with JVMTI for the curious ones that want to play with JVMTI.

CRM
  • 4,569
  • 4
  • 27
  • 33
  • to your second point: What do you mean with the profiler executes all the code in one thread? This is not happening, the Java program is launched as any other program, the only addition in the injected profiling agent. – Konrad Reiche May 03 '11 at 17:46
  • @platzhirsch In the profiler output, you have two threads: `Thread-0` and `Thread-1`. But all the application output is printed by `Thread-1`. Well, I'm assuming this because `Thread-0` does not output any `wait` `String`. – CRM May 03 '11 at 17:49
1

If you have a race condition in your code, a profiler can slow down the code enough to either show or hide an error in your code. (I like to run my program in a profiler, just to show up race conditions.)

As notifyAll() will only notify wait()ing threads, calling wait() after the notifyAll() is likely to result in missing the notify. i.e. its stateless, it doesn't know you called notify before.

If you slow down your application the notifyAll() can be delayed until after the wait() starts.

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • 1
    @Peter-Layrey: The results are from one and the same execution, which means the Java output is also the code which is slown down by the injected profiler agent. So this does not explain why notifyAll is so often called in the beginning. – Konrad Reiche May 03 '11 at 12:43
  • That is true. I am surprised it is from the same run given the first one "wait left" is printed but in the second it is not. Do you have any thoughts on how this is possible? – Peter Lawrey May 03 '11 at 13:00
  • @Peter-Lawrey: Because I forgot to place the println for left wait(), this is normal - I should have included it. However the focus is on notifyAll or do you think this has something to do with it? – Konrad Reiche May 03 '11 at 13:02
  • Possibly not, it just confused me. I don't know why notifyAll appears to be called more times than it actually is. Perhaps you are seeing a notifyAll() on each object waiting (by the number of notifiees) – Peter Lawrey May 03 '11 at 13:20