0

I'm running two benchmarks in order to compare costs of Thread.sleep() and Thread.onSpinWait():

@State(Scope.Thread)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
public class ThreadSleep2Benchmark {
  private final ExecutorService executor = Executors.newFixedThreadPool(1);
  volatile boolean run;

  @Param({"1", "5", "10", "50", "100"})
  long delay;

  @Setup(Level.Invocation)
  public void setUp() {
    run = true;
    startThread();
  }

  @TearDown(Level.Trial)
  public void tearDown() {
    executor.shutdown();
  }

  @Benchmark
  public int sleep() throws Exception {
    while (run) {
      Thread.sleep(1);
    }
    return hashCode();
  }

  private void startThread() {
    executor.submit(() -> {
      try {
        Thread.sleep(delay / 2);
        run = false;
      } catch (InterruptedException e) {
        Thread.currentThread().interrupt();
        throw new RuntimeException(e);
      }
    });
  }
}

Then I run the one with Thread.onSpinWait():

@State(Scope.Thread)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
public class ThreadOnSpinWaitBenchmark {
  private final ExecutorService executor = Executors.newFixedThreadPool(1);
  volatile boolean run;

  @Param({"1", "5", "10", "50", "100"})
  long delay;

  @Setup(Level.Invocation)
  public void setUp() {
    run = true;
    startThread();
  }

  @TearDown(Level.Trial)
  public void tearDown() {
    executor.shutdown();
  }

  @Benchmark
  public int onSpinWait() {
    while (run) {
      Thread.onSpinWait();
    }
    return hashCode();
  }

  private void startThread() {
    executor.submit(() -> {
      try {
        Thread.sleep(delay / 2);
        run = false;
      } catch (InterruptedException e) {
        Thread.currentThread().interrupt();
        throw new RuntimeException(e);
      }
    });
  }
}

Both demonstrate nearly same results for delay > 1 ms:

Benchmark                             (delay)  Mode  Cnt   Score    Error  Units

ThreadOnSpinWaitBenchmark.onSpinWait        1  avgt   20   0,003 ±  0,001  ms/op
ThreadOnSpinWaitBenchmark.onSpinWait        5  avgt   20   2,459 ±  0,027  ms/op
ThreadOnSpinWaitBenchmark.onSpinWait       10  avgt   20   5,957 ±  0,064  ms/op
ThreadOnSpinWaitBenchmark.onSpinWait       50  avgt   20  27,915 ±  0,225  ms/op
ThreadOnSpinWaitBenchmark.onSpinWait      100  avgt   20  53,112 ±  0,343  ms/op

ThreadSleep2Benchmark.sleep                 1  avgt   20   1,420 ±  0,043  ms/op
ThreadSleep2Benchmark.sleep                 5  avgt   20   3,183 ±  0,099  ms/op
ThreadSleep2Benchmark.sleep                10  avgt   20   6,723 ±  0,069  ms/op
ThreadSleep2Benchmark.sleep                50  avgt   20  29,697 ±  0,307  ms/op
ThreadSleep2Benchmark.sleep               100  avgt   20  54,730 ±  0,329  ms/op

This is quite expected.

I'd like however to measure CPU load of both approaches. I know that on Linux I can use LinuxPerfNormProfiler but I'm not sure which particular metric I should take to get reliable insight.

Sergey Tsypanov
  • 3,265
  • 3
  • 8
  • 34
  • I'm not sure what you are measuring here. `Thread.sleep` has virtually 0 CPU load. Your thread will be blocked, and will not be given another shot at the CPU until the time expires. `Thread.onSpinWait` does not block; it just allows other threads to run momentarily, but your `onSpinWait` function is going to consume lots of CPU resources. They are used for very different things. Measuring elapsed time is NOT a good metric of CPU load. – Tim Roberts Jul 28 '22 at 21:17
  • @TimRoberts "Measuring elapsed time is NOT a good metric of CPU load" this is exactly why I've asked about other metrics – Sergey Tsypanov Jul 30 '22 at 05:04
  • The thing is, it's not like these are "close". They are very different approaches. In 1 second of elapsed time, the "sleep" thread will consume 0s of CPU time. The "onSpinWait" thread will consume nearly 1s. One is the right approach, one is the wrong approach. – Tim Roberts Jul 30 '22 at 05:44
  • What do you mean by "right approach" and "wrong approach"? I assume it depends on the usage scenario, doesn't it? – Sergey Tsypanov Aug 01 '22 at 11:13
  • Yes, but. What you're showing here is an artificial workload. `onSpinWait` is very similar to `Thread.sleep(0)`. It gives up the CPU and immediately asks for it back. In this artificial example, that's wrong. You have to know whether your algorithm is CPU-bound or IO-bound and make "good neighbor" decisions based on that. – Tim Roberts Aug 01 '22 at 17:25
  • Which approach is preferable to you in case of IO-bound algorithm - `onSpinWait` or `sleep`? – Sergey Tsypanov Aug 02 '22 at 17:19
  • If you have work to do, then you do the work. If you don't have work to do, then you sleep. `onSpinWait` is intended for a very narrow use case -- when you are in the main UI thread, in a CPU-intensive loop, and you want to allow the main loop to process pending messages so the UI doesn't appear frozen. Since the Best Practice is to spin CPU-intensive tasks into a separate thread (which does not impact the UI), it is very rarely used. – Tim Roberts Aug 02 '22 at 18:41
  • @TimRoberts this is not how onSpinWait works. Check my answer for more detail. – pveentjer Aug 11 '22 at 15:21
  • 1
    Thank you for the explanation. That means it is an extremely low-level micro-optimization that is even less useful that the UI-oriented implementation I had assumed, and should almost never occur is user code. – Tim Roberts Aug 11 '22 at 17:00

3 Answers3

2

onSpinWait is very different than Thread.sleep(0).

When you are working with non blocking algorithms and spinning on some variable waiting for a certain state, when that state is reached the CPU can run into a very expensive pipeline flush. This is because the instruction pipeline is filled with loads for that variable, and these can be executed out of order. So an earlier load could see a later value than a later load. This incoherent behavior is not allowed (first seeing a new value and then back to seeing an old value).

This is called a memory order violation and on the X86 this leads to a machine clear which causes the instruction pipeline to be flushed. The CPU then needs to restart from the last retired instruction and try again, and perhaps the field isn't in the right state any longer so it can start again.

To prevent running into this memory order violation, on the X86 the onSpinWait executes a PAUSE instruction. This stalls the instruction pipeline for a certain number of cycles and prevents this expensive memory order violation from happening.

Another advantage of PAUSE is that you give the hypersibling more headroom and the CPU heats up less.

For some information about recent changes in the length of the PAUSE instruction, check the following link.

pveentjer
  • 10,545
  • 3
  • 23
  • 40
1

It seems that events command line argument allows us to specify the perf events we want LinuxPerfNormProfiler to collect.

To get the list of supported perf events on Linux run perf list in a terminal: see perf wiki.

task-clock seems to be the recommended event to measure CPU load.

P.S. with LinuxPerfAsmProfiler you might even get cpu load per function.
Also you might need to make some extra work to enable full Java support in perf.

qwerty
  • 41
  • 3
  • The answer has links without any information – Krishna Majgaonkar Jul 31 '22 at 11:51
  • @KrishnaMajgaonkar the question is "I'm not sure which particular metric I should take". The answer says: use `task-clock` + provides additional links, which might be very useful to those who profile Java with `perf`. – qwerty Aug 01 '22 at 05:21
  • I'm sure your answer will be useful but it's recommended to quote relevant information from the link. Please refer to the document on [how to write a good answer](https://stackoverflow.com/help/how-to-answer). Also, it's important to provide hyperlinks to the correct text. The page opened after clicking the link `seems` doesn't make any sense – Krishna Majgaonkar Aug 01 '22 at 08:11
  • @KrishnaMajgaonkar the first [seems](https://github.com/openjdk/jmh/blob/1.35/jmh-core/src/main/java/org/openjdk/jmh/profile/LinuxPerfNormProfiler.java#L73) is a link to the place in the `LinuxPerfNormProfiler`'s sources where the `events` argument is defined (I found no `LinuxPerfNormProfiler` documentation websites, the source code work as documentation in such cases). The second [seems](https://stackoverflow.com/a/56967896) is a link to a SO answer which provides some details of what exactly the `task-clock` measures. Also I made the answer a community wiki, so feel free to improve it. – qwerty Aug 01 '22 at 14:44
1

Given a choice between Thread.sleep() and Thread.onSpinWait(), the right choice is probably Thread.sleep().

If you already have a specific reason for using onSpinWait() – like you're doing low-level performance work – then I think it's a safe assumption that you wouldn't be asking the question to begin with, nor would you be comparing it against Thread.sleep().

If you do not have a specific reason to use onSpinWait(), then you should not use it.


Javadoc for Thread.onSpinWait() says:

By invoking this method within each iteration of a spin-wait loop construct, the calling thread indicates to the runtime that it is busy-waiting.

Wikipedia article for busy waiting says:

busy-waiting, busy-looping or spinning is a technique in which a process repeatedly checks to see if a condition is true, such as whether keyboard input or a lock is available

In most cases spinning is considered an anti-pattern and should be avoided, as processor time that could be used to execute a different task is instead wasted on useless activity. Spinning can be a valid strategy in certain circumstances, most notably in the implementation of spinlocks within operating systems designed to run on SMP systems.


Beyond either of these, there are other mechanisms available in Java for managing interactions between threads, so perhaps those would be worth exploring if Thread.sleep() is limiting somehow. The Java Tutorial section on Concurrency is a good place to start.

Kaan
  • 5,434
  • 3
  • 19
  • 41