4

I don't want to change this code, I'm only interested in JVM, OS or kernel customization/configuration for best results!


I have one second loop (1000 x 1ms)

public static void main(String[] args) throws InterruptedException {
    long start = System.nanoTime();
    for (int i = 0; i < 1000; i++ ) {
        Thread.sleep(TimeUnit.MILLISECONDS.toMillis(1));
    }
    long duration = System.nanoTime()  - start;
    System.out.println("Loop duration " + 
         duration / TimeUnit.MILLISECONDS.toNanos(1) + " ms.");
}

On my Fedora 20 with kernel 3.12 this loop needs 1055 ms.

This is pretty good result, average is more than 1100ms.

Is possible to make this code faster with custom JVM flags or OS configuration?

Loop duration 1055 ms.
MariuszS
  • 30,646
  • 12
  • 114
  • 155
  • 10
    What are you trying to accomplish here? Reduce the overhead associated with the sleep call? – JVMATL Jan 17 '14 at 18:56
  • 2
    What you want to achive ? – Bhavik Ambani Jan 17 '14 at 18:57
  • I think this is not related to sleep call, is your result from this method similar to mine? – MariuszS Jan 17 '14 at 18:57
  • This is normal benchmark by design. :) – MariuszS Jan 17 '14 at 18:58
  • I'm interested in JVM flags for low latency code execution or OS configuration. – MariuszS Jan 17 '14 at 19:00
  • It is unlikely you are getting anything remotely resembling an accurate measurement using the system clock – Brian Roach Jan 17 '14 at 19:01
  • I think everybody's result will be similar: each sleep will sleep at least 1ms, plus a little slop due to system timers and contention from other threads. You could make it faster by changing `i < 1000` to `i < 500`, but I'm pretty sure that's not what you mean.. :) – JVMATL Jan 17 '14 at 19:02
  • 2
    @MariuszS For low latency, you wouldn't call sleep. – Peter Lawrey Jan 17 '14 at 19:02
  • Worth reading: http://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java – Brian Roach Jan 17 '14 at 19:03
  • @PeterLawrey, ok but with simple kernel modification I can change this result from 1055ms to 1100ms. So this show something. Is possible to change this result with JVM flags? – MariuszS Jan 17 '14 at 19:05
  • I just ran this a few times and got between 1900 and 2300 each time. – The Guy with The Hat Jan 17 '14 at 19:05
  • @RyanCarlson exactly, I have optimized my code to achieve 1054ms – MariuszS Jan 17 '14 at 19:06
  • 1
    @MariuszS The JVM play no part in how scheduling is done in the OS. Even playing with Thread priority often doesn't do what you might think. You can increase the priority of the process externally. – Peter Lawrey Jan 17 '14 at 19:06
  • Using `Thread.sleep(1000)` just once always gives me exactly 1000. – The Guy with The Hat Jan 17 '14 at 19:10
  • @RyanCarlson beacause this 54ms or more is related to scheduling. – MariuszS Jan 17 '14 at 19:11
  • @PeterLawrey Thanks, so to the only way to make this code faster is OS/kernel customization? Which OS has faster scheduler? – MariuszS Jan 17 '14 at 19:12
  • @JVMATL, no the results differs. For default Windows Vista 1000ms, for other OS from 1090 to 2000ms. – MariuszS Jan 17 '14 at 19:27
  • 1
    Sorry, but this is really a completely useless test, that is not going to give any meaningful result for anything. For example, there is no guarantee for the resolution of the timer. On some systems, it might be for example 20ms. – Jesper Jan 17 '14 at 19:29
  • @Jesper This is impossible with normal OS :) This tests shows OS and Java latency, context switching time for Java, sometimes this is useful. – MariuszS Jan 17 '14 at 19:31
  • @MariuszS If you need low latency, I don't use the scheduler at all. I busy wait on an isolated CPU so you never give up the CPU and never have another thread running on that CPU. – Peter Lawrey Jan 17 '14 at 20:29
  • Like people have said, how well that specific test is largely going to depend on what else is happening on the system. How many times does your JVM context switch? What else is running on the system? What is your load average? How much RAM is in the machine, that kind of thing. These factors are much more important than choice of OS or JVM. – Mikkel Løkke Jan 22 '14 at 15:43
  • This test is repeatable with similar outcome without any problem. – MariuszS Jan 22 '14 at 15:48

8 Answers8

14

Calling sleep() you are basically telling the OS to suspend your thread for AT LEAST X milliseconds. There is no guarantee whatsoever that it will continue executing exactly after this time or the OS will re-scheadule your thread later. Furthermore, the minimum amount of sleep time and its accuracy heavily depends on the OS.

EDIT: Also you should take into account that in your case , (most probably) your code is being interpreted! JAva compiles to native code only hotspots (and fromm here comes the name of the Hotspot JIT) which are being executed frequently. For server VM, this is 10k executions of a given code. You only have 1k.

Svetlin Zarev
  • 14,713
  • 4
  • 53
  • 82
6

Note that your code is doing things besides waiting for precisely one second. It's entering code for a for loop, setting up variables to track it and iterating. But more than all this, you have to understand what else is happening with your system.

Your operating system has something called a scheduler that decides what running program ('process') gets to access the CPU at any given time. If a program, such as yours, goes to sleep (which is defined as 'don't do anything for at least x unit of time'), the scheduler will often switch it out for another program (of which you have many running). When it gets switched back in is non-deterministic. Thus, even if you happen to be switched back in close to the one second mark (which is likely) it is unlikely that that it will be exactly at one second. Thus, 'improving' this code will never help with the underlying issue of wanting an exactly-one-second loop.

Note, too, that a program can be switched out by the scheduler at any time: the program needn't voluntarily go to sleep. That is the task of the scheduler; to arbitrate which processes gets access to system resources at any particular point. Thus, time-profiling, especially in this sort half-implemented way, is not particularly useful. Use an IDE profiler to get a better idea, because they measure things such as wall time.

Nathaniel Ford
  • 20,545
  • 20
  • 91
  • 102
  • I want to minimize this *wall time* :) – MariuszS Jan 17 '14 at 19:48
  • Your code is not doing anything significant. Reducing your wall time is going to be hard: you only have 5.5% of the total time it takes to play with because you're insisting on the rest. The OS will always add something, so you're dealing with even less than that. Maybe unroll your loop? – Nathaniel Ford Jan 17 '14 at 19:54
  • The OS will always add something, but I want to find best OS or best OS customization to make this faster. – MariuszS Jan 17 '14 at 19:56
  • 1
    In that case, see my answer. This is not a useful test for estimating performance of real code. – keshlam Jan 17 '14 at 20:24
2

Maybe what you really need to look into is http://en.wikipedia.org/wiki/Real_time_Java - if you need guarantees of low-latency, you need a JVM and an OS tuned to give you that.

JVMATL
  • 2,064
  • 15
  • 25
  • This is not the answer I want, but the best here :) – MariuszS Jan 17 '14 at 19:55
  • 2
    Real time java is largely unsupported now. There is one commercial provider, but it is important to remember that real time != low latency. Real time means consistent latency (or low worst case latency) – Peter Lawrey Jan 17 '14 at 20:30
1

By my knowledge, one of the factors here is the kernel system tick time (I think it's 200 tps for desktops, 100 for servers, and 1000 forRT systems). This causes small delays which accumulate up to the 55 ms. Additionally, the sleep call will have some system overhead, which is hard to reduce by itself.

Uli Köhler
  • 13,012
  • 16
  • 70
  • 120
1

System.currentTimeMillis should not be used as a measure of elapsed time. You should be using System.nanoTime. Look here for a bit more explanation.

Community
  • 1
  • 1
Kent Hawkings
  • 2,793
  • 2
  • 25
  • 30
1

Well, you could obviously factor out the TimeUnit conversion and save a few cycles. You could also count down rather than up; using a !=0 test is usually faster than comparing to other values.

You should also make sure the code is fully JITted (which can take several minutes of running it) before you take ANY measurements.

Generally, microbenchmarks are misleading in Java, and microoptimizing without knowing how much that code contributes to your runtime tends to be wasted effort in any case. Don't bother with this sort of exercise. Write the best code you can, give it lots of warm-up time on an assortment real data, then use a profiler to see where it's spending its time (also on real data). That will tell you where performance tuning will actually be productive. Then consider algorithmic improvements, which tend to yield the highest benefit. Profile again with the new code and see what is hot now. Repeat.

Remember, infinite improvement of something that accounts for 1% of runtime takes infinite effort but yields only 1% improvement. Put your effort where it makes a difference. And, especially in hotspot Javas where code continues to be optimized during execution but that optimization is not fully deterministic, don't trust a single execution to give you real performance numbers.

JVMATL
  • 2,064
  • 15
  • 25
keshlam
  • 7,931
  • 2
  • 19
  • 33
  • (Yes, "microbenchmarks" would be clearer here; thanks for the edit.) – keshlam Jan 17 '14 at 20:22
  • You can try the -Xnojit option, if your implementation of java supports it. If not, check its documentation to see if it supports anything equivalent. Otherwise, you may not have any choice in the matter. (I'm not going to ask why you're bothering with performance questions at all if you're turning off the JIT.) – keshlam Jan 22 '14 at 15:59
0

Why don't you simply put it into one single sleep?

Thread.sleep(1000);

I you really want to do 1000 sleep commands I recommend this:

public static void main(String[] args) throws InterruptedException {

    // This line is for timing reasons only!
    long start = System.currentTimeMillis();

    final long startTime = System.currentTimeMillis();
    long waitTime;

    for (int i = 0; i < 1000; i++ ) {
        // Get the time you want to end with. Then substact your current system time!
        waitTime = (startTime + i + 1)- System.currentTimeMillis();

        // Only wait if it would wait (If waitTime is greater than 0.
        // Everything below 0 will also throw a execption!
        if(waitTime > 0)
            Thread.sleep(waitTime);
    }

    // Same for those...
    long duration = System.currentTimeMillis() - start;
    System.out.println("Loop duration " + duration + " ms.");

}

This will make sure you only wait if that makes currently sense!

BrainStone
  • 3,028
  • 6
  • 32
  • 59
0

Unfortunately, my question was misunderstood.

Realtime java is abandoned, so advice for using Realtime java is not valid.

After some researches this test has best results on some Windows machines.

On tested Windows 8.1 this tests prints exactly 1000ms.

Other results:

  • Mac Os X 10.9.1 with Java 1.7.0_25 : 1180 - 1190ms
  • Ubuntu 12.04/Corei3/4GB : 1122 ms

References

Community
  • 1
  • 1
MariuszS
  • 30,646
  • 12
  • 114
  • 155
  • Wait. So you want real-time functionality, but you aren't using a realtime OS or something like JamaicaVM? – Mikkel Løkke Jan 22 '14 at 15:38
  • No, I want low latency results. – MariuszS Jan 22 '14 at 15:48
  • 2
    You keep using that word. I do not think it means what you think it means: http://wordnetweb.princeton.edu/perl/webwn?s=latency – Mikkel Løkke Jan 22 '14 at 16:10
  • http://en.wikipedia.org/wiki/Interrupt_latency Latency generated by switching from Java program to OS (on sleep()) and back. Switching context needs some time. – MariuszS Jan 22 '14 at 17:38
  • Context switching is a feature of the OS. If you want to reduce the time spent context switching, you should choose an OS which has faster context switch times, like QNX, or have a processor with more cores so they happen less frequently. – Mikkel Løkke Jan 23 '14 at 07:05
  • or... Windows 8.1 ;) 0ms latency. – MariuszS Feb 01 '14 at 18:21