Call to System.nanoTime() in busy wait takes much longer time than usual randomly

Question

I need to send tcp packages to a server repeatedly. The sending speed is 800 msgs per second, and every msg should be sent in a uniform speed, 1.25ms. I caculate the time used by sending, and sleep the thread for 1.25ms - sendingTime using Thread.sleep(). But I found Thread.sleep() will take at least 1ms on my machine even if I passed 1ns to it. After some searching on the web, I changed the sleep method to busy wait:

long delayNanoSec = 1250000;

long endTime = System.nanoTime() + delayNanoSec;
while (System.nanoTime() < endTime);

This runs well when the delayNanoSec < 12500 or > 12500000, it costs some time around delayNanoSec, but when set to 1250000, it will cost much longer than delayNanoSec randomly. I changed the code to record time used by System.nanoTime() in every loop:

long[] timeUsed = new long[50000];

long delayNanoSec = 1250000;

long endTime = System.nanoTime() + delayNanoSec;
long lastTime = 0;
int index = 0;
do {
    lastTime = System.nanoTime();
    timeUsed[index++] = lastTime;
}while(lastTime < endTime);

for(int i = 0; i < times-1; i++) {
    System.out.println("INDEX: " + i + " TIME USED: " + (nanos[i+1] - nanos[i]));
}

I got something like this:

INDEX: 1729 TIME USED: 226
INDEX: 1730 TIME USED: 387
INDEX: 1731 TIME USED: 240
INDEX: 1732 TIME USED: 240
INDEX: 1733 TIME USED: 219
INDEX: 1734 TIME USED: 246
INDEX: 1735 TIME USED: 289
INDEX: 1736 TIME USED: 283
INDEX: 1737 TIME USED: 245
INDEX: 1738 TIME USED: 244
INDEX: 1739 TIME USED: 4254974

After testing this code several times, it looks like System.nanoTime() will cost a very long time to return randomly. In most cases, it costs about 250ns, but sometimes it costs more than 3-4ms.

Does anyone know the reason about this? Or is there a better way to do this job? (Control the time between every sending)

BTW, sorry for my English, hope you understand what I mean.

Do not expect real-time behavior from JVM. JVM may pause an application at any time to perform GC, deoptimization, biased lock revocation, code cache cleanup or other maintenance task. — apangin, Jul 23 '17 at 00:39
Is the longer time always on the last iteration of the loop, when it breaks out because of going past `endTime`? — Chris Nauroth, Jul 23 '17 at 06:21
@ChrisNauroth it happens in the last call most time, and cost more than 1ms. and happens in the middle sometime, but then it usually cost not more than 90000ns. When the delay is less than 12500 or greater than 12500000, it happens barely. I tried jdk 6 and jdk 8, get the same result. — Yang Hu, Jul 24 '17 at 00:04
@apangin I wrote a test class including only a busy wait to test this. There are only a primitive int type to store the time, and a while loop which call the nanoTime method repeatedly. Very simple and no object creation code wrote by myself, and it always happens when the delay is in the range described in my question. This feels weird. — Yang Hu, Jul 24 '17 at 00:14
Again, JVM may pause an application *almost at any time* not only for GC, but for many other reasons. In your case the delay is likely to be caused by JIT compilation that starts after a certain number of loop iterations. Try running in interpreted mode: `-Xint` — apangin, Jul 24 '17 at 00:50
Even an OS does not give you real-time guarantees. For example, it may migrate a thread from one CPU to another, and this will also result in a delay. — apangin, Jul 24 '17 at 00:59
@apangin thanks for your advice, have tried xint option, this reduces the frequency it appears, but dose not eliminate it. — Yang Hu, Jul 24 '17 at 01:17
@apangin I agree with you, now I wonder know what really happened at that moment, do you know what tools/techniques should I use to inspect that? — Yang Hu, Jul 24 '17 at 01:33
@ChrisNauroth Thanks for your answer, especially the detail of your analysis, this is what I want to know. Thanks again. — Yang Hu, Jul 24 '17 at 12:14

score 0 · Accepted Answer · answered Jul 24 '17 at 06:42

Summary

Your code sample constitutes a JVM micro-benchmark, but you have not controlled for external factors. In particular, my tests indicate that JIT compilation is still quite busy during the test run. This particular issue can be addressed by running a sufficient warm-up phase before taking any measurements.

Details

It's notoriously tricky to micro-benchmark code running on the JVM, because there are multiple confounding factors that can occur non-deterministically, outside the control of the code under test. These factors include garbage collection, class loading, static initialization of classes and JIT compilation. Each of these occur without direct invocation by the code being measured, and they all can result in what looks like extra time spent executing the code under test.

There is an excellent prior question on the topic of micro-benchmarking on the JVM.

How do I write a correct micro-benchmark in Java?

Note in particular Rule 1 about a warmup phase, recommending "...[the] rule of thumb is several tens of thousands of inner loop iterations."

I decided to apply these techniques to your code sample. First, I simplified the test code to this:

class TestNano {

    public static void main(String[] args) throws Exception {
        long[] nano = new long[10000];

        for (int i = 0; i < nano.length; ++i) {
            nano[i] = System.nanoTime();
        }

        for (int i = 0; i < nano.length; ++i) {
            System.err.println("INDEX: " + i
                    + " NANO: " + nano[i]
                    + " TIME USED: " + (i == 0 ? 0 : (nano[i] - nano[i - 1])));
        }
    }
}

I executed it using this command:

java -XX:+UnlockDiagnosticVMOptions \
        -XX:CompileCommand=print,*.* \
        -XX:+PrintAssembly \
        -Xbatch \
        -XX:CICompilerCount=1 \
        -XX:+PrintGCDetails \
        -XX:+PrintGCTimeStamps \
        -verbose:gc \
        TestNano 2>&1 | tee TestNano.out

Note that use of System.err in the code is significant. The extra JVM diagnostic information prints to stderr, and I want that output and our own "INDEX:..." output in the same file, so that we can see how JVM activity coincides with the loop iterations.

This produced some results similar to yours:

INDEX: 9976 NANO: 1500876274305377000 TIME USED: 0
INDEX: 9977 NANO: 1500876274305377000 TIME USED: 0
INDEX: 9978 NANO: 1500876274305377000 TIME USED: 0
INDEX: 9979 NANO: 1500876274307294000 TIME USED: 1917000
INDEX: 9980 NANO: 1500876274307295000 TIME USED: 1000
INDEX: 9981 NANO: 1500876274307295000 TIME USED: 0
INDEX: 9982 NANO: 1500876274307295000 TIME USED: 0

INDEX: 14560 NANO: 1500876274307599000 TIME USED: 0
INDEX: 14561 NANO: 1500876274307599000 TIME USED: 0
INDEX: 14562 NANO: 1500876274307599000 TIME USED: 0
INDEX: 14563 NANO: 1500876274312344000 TIME USED: 4745000
INDEX: 14564 NANO: 1500876274312358000 TIME USED: 14000
INDEX: 14565 NANO: 1500876274312358000 TIME USED: 0
INDEX: 14566 NANO: 1500876274312358000 TIME USED: 0

I'm always suspicious of garbage collection when I see this kind of pattern, but in this case, I saw no evidence of garbage collection from the verbose GC logging that I enabled.

Instead, every time I had a multi-millisecond blip, it roughly corresponded with prints of compilation information, which looks similar to this:

Compiled method (c2)     647   64             java.lang.StringBuilder::toString (17 bytes)
 total in heap  [0x000000010aae8610,0x000000010aae8d80] = 1904
 relocation     [0x000000010aae8730,0x000000010aae8760] = 48
 main code      [0x000000010aae8760,0x000000010aae8a40] = 736
 stub code      [0x000000010aae8a40,0x000000010aae8a58] = 24
 oops           [0x000000010aae8a58,0x000000010aae8a80] = 40
 scopes data    [0x000000010aae8a80,0x000000010aae8b60] = 224
 scopes pcs     [0x000000010aae8b60,0x000000010aae8d20] = 448
 dependencies   [0x000000010aae8d20,0x000000010aae8d28] = 8
 handler table  [0x000000010aae8d28,0x000000010aae8d70] = 72
 nul chk table  [0x000000010aae8d70,0x000000010aae8d80] = 16
Decoding compiled method 0x000000010aae8610:
Code:
[Entry Point]
[Constants]
  # {method} 'toString' '()Ljava/lang/String;' in 'java/lang/StringBuilder'
  #           [sp+0x50]  (sp of caller)
  0x000000010aae8760: mov    0x8(%rsi),%r10d
  0x000000010aae8764: shl    $0x3,%r10
  0x000000010aae8768: cmp    %r10,%rax

I saw significant JIT compilation activity happening intermittently within the first 20,000 loop iterations. After that, I did not see compilation activity, and the TIME USED was consistently sub-millisecond.

All of the above comes with the caveat that your results may vary. Performance analysis is often particular to the environment executing the test. In my case, I was running the test on a Mac with this JVM version:

java version "1.7.0_79"
Java(TM) SE Runtime Environment (build 1.7.0_79-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.79-b02, mixed mode)

Nonetheless, I think we have sufficient evidence here that the root cause was insufficient warmup time causing stalls due to JIT compilation.

A key conclusion for your program is that you cannot rely on the JVM for exact execution to a deadline, especially such a tight deadline measured in nanoseconds. As a commenter stated, you will not get real-time behavior. It might work fine most of the time, but there will outliers of longer latency like this. (Depending on your application, an occasional stall like this might be acceptable.)

Call to System.nanoTime() in busy wait takes much longer time than usual randomly

1 Answers1

Summary

Details