2

I created a Java program to emit events in a specific frequency. I am using System.nanoTime() instead of Thread.sleep() because the first gives a higher precision on the interval according to many references here and here. However, I guess that when I try to set it to emit a data rate of 1M records/second it is not achieving the goal. This is my code:

long delayInNanoSeconds = 1000000;

private void generateTaxiRideEvent(SourceContext<TaxiRide> sourceContext) throws Exception {
    gzipStream = new GZIPInputStream(new FileInputStream(dataFilePath));
    reader = new BufferedReader(new InputStreamReader(gzipStream, StandardCharsets.UTF_8));
    String line;
    TaxiRide taxiRide;
    while (reader.ready() && (line = reader.readLine()) != null) {
        taxiRide = TaxiRide.fromString(line);
        sourceContext.collectWithTimestamp(taxiRide, getEventTime(taxiRide));
        // sleep in nanoseconds to have a reproducible data rate for the data source
        this.dataRateListener.busySleep();
    }
}

public void busySleep() {
    final long startTime = System.nanoTime();
    while ((System.nanoTime() - startTime) < this.delayInNanoSeconds) ;
}

So, when I wait for 10000 nanoseconds in delayInNanoSeconds variable I will get a workload of 100K rec/sec (1_000_000_000 / 10_000 = 100K r/s). When I wait for 2000 nanoseconds in delayInNanoSeconds variable I will get a workload of 500K rec/sec (1_000_000_000 / 2_000 = 500K r/s). For 1000 nanoseconds I will get a workload of 1M rec/sec (1_000_000_000 / 1000 = 1M r/s). And for 500 nanoseconds a workload of 2M rec/sec (1_000_000_000 / 500 = 2M r/s).

I saw here that it could be better to use double instead of long to increase the precision. Is it somehow related? Or the problem is just an OS limitation (I am using Linux Ubuntu 18)? Or maybe it I because I am using the readLine() method and there is a faster way to emit these events? I think that when I am using the GZIPInputStream class I am loading the whole file in memory and the readLine() does not access the disk anymore. How can I increase the data rate of my application?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Felipe
  • 7,013
  • 8
  • 44
  • 102
  • 2
    Your assumption is that your code is executed in 0ns, but this is definitely not true. Try to run the code without delay and see what frequency you get. Then you need to subtract the time that your logic takes from the time of your delay. – Tobias Geiselmann May 28 '20 at 09:53
  • Did you even achieve over 1M r/s without any delay? – Tobias Geiselmann May 28 '20 at 09:54
  • but I want some code that I can control the frequency without recompiling it. So I can use in my tests to vary the data rate to any frequency that I want. – Felipe May 28 '20 at 09:56
  • I understood the idea but not enough to see it implemented. How does it will look like in Java code? – Felipe May 28 '20 at 10:01
  • 2
    It’s very weird to see code trying to delay *reading from a file*. But anyway, just change your loop to `/*outer loop*/ { final long startTime = System.nanoTime(); /* do your operation */ while((System.nanoTime() - startTime) < this.delayInNanoSeconds) Thread.onSpinWait(); }`. – Holger May 28 '20 at 11:13
  • indeed, this helped. thanks. I don't get why you are using `Thread.onSpinWait()` – Felipe May 28 '20 at 11:24

1 Answers1

2

@TobiasGeiselmann makes a good point: your delay calculation doesn't take into account the time spent between calls to busySleep

You should be calculating a deadline relative to the last deadline, not the current time after logging. Don't use the result from the previous System.nanoTime() either; that will be some time >= the actual deadline (because nanoTime itself takes time, at least a few nanoseconds, so it unavoidably over-sleeps). You'd accumulate error that way.

Before the first iteration, find the current time and set long deadline = System.nanoTime();. At the end of every iteration, do deadline += 1000; and use your busy-wait loop to spin until now >= deadline.


If deadline - now is large enough, use something that yields the CPU to other threads until close to the wakeup deadline. According to comments, LockSupport.parkNanos(…) is a good choice for modern Java, and may actually busy-wait for short enough sleeps(?). I don't really know Java. If so, you should just check the current time, calculate time till deadline, and call it once.

(For future CPUs like Intel Tremont (next-gen Goldmont), LockSupport.parkNanos could portably expose functionality like tpause to idle the CPU core until a given TSC deadline. Not via the OS, just a hyperthreading-friendly deadline pause, good for short sleeps on SMT CPUs.)

Busy-waiting is generally bad but is appropriate for high-precision very short delays. 1 microsecond is not long enough to usefully let the OS context switch to something else and back, on current hardware with current OSes. But longer sleep intervals (when you've chosen a lower frequency) should sleep to let the OS do something useful on this core, instead of just busy waiting for so long.

Ideally when you are spinning on a time-check, you'd be executing an instruction like x86's pause in the delay loop, to be more friendly to other logical core sharing the same physical core (hyperthreading / SMT). Java 9 Thread.onSpinWait(); should be called in spin-wait loops (especially when waiting on memory), which lets the JVM expose this concept in a portable way. (I assume that's what it's for.)


This will work if your system is fast enough to keep up while running that time-getting function once per iteration. If not, then you could maybe check a deadline every 4 iterations (loop unrolling), to amortize the cost of nanoTime() so you log in bursts of 4 or something.

Of course if your system isn't fast enough even with no delay call at all, you'll need to optimize something to fix that. You can't delay for a negative amount of time, and checking the clock itself takes time.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • I am not sure. I guess every `deadline += 1000` operation will consume more CPU plus the `<` comparison that is already in place. I guess it should have less operation as possible inside the loop and outside as well. Or maybe I didn't understand very well your idea. – Felipe May 28 '20 at 10:39
  • 2
    @Felipe: integer addition is trivial in cost compared to checking the current time. Even if that was as cheap as one x86 `rdtsc` instruction without any scaling, `rdtsc` is microcoded and has a throughput of about one per 25 core clock cycles on a modern CPU like Skylake (https://uops.info/, https://agner.org/optimize/). But a 64-bit integer `add rdx, rcx` or something is a single uop, and modern CPUs have 4/clock throughput for that. Also, your version does `System.nanoTime() - startTime` every iteration. My way just does `System.nanoTime() < deadline` once you start iterating. – Peter Cordes May 28 '20 at 10:45
  • Do you mean something like this? `long startTime = System.nanoTime(); long deadLine = startTime + this.delayInNanoSeconds; while (System.nanoTime() < deadLine);` – Felipe May 28 '20 at 10:51
  • 2
    `Thread.sleep` would be a bad choice. `LockSupport.parkNanos(…)` would be the right thing for the remaining number of nanoseconds that have been calculated anyway. For the busy loop, Java 9 introduced [`Thread.onSpinWait();`](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/Thread.html#onSpinWait()) to be invoked within the loop as a marker; it doesn’t need to have an actual effect, but provides a hint to the environment, what you are doing, allowing potential optimizations. – Holger May 28 '20 at 11:17
  • 1
    @Holger: Thanks, updated. Neat that Java has introduced portable ways to expose hardware functionality like `pause` and even the upcoming `twait` which `parkNanos` could take advantage of. – Peter Cordes May 28 '20 at 11:33
  • 2
    @Felipe: No, not like that. I mean initializing `delay` before the *outer* loop, the one you're regulating. I was talking about cost of stuff like `reader.readLine()`, not costs inside the delay loop. You need to keep state between calls to `busySleep`. I hope you already figured that out since you accepted the answer. – Peter Cordes May 28 '20 at 11:36