I created a Java program to emit events in a specific frequency. I am using System.nanoTime()
instead of Thread.sleep()
because the first gives a higher precision on the interval according to many references here and here. However, I guess that when I try to set it to emit a data rate of 1M records/second it is not achieving the goal. This is my code:
long delayInNanoSeconds = 1000000;
private void generateTaxiRideEvent(SourceContext<TaxiRide> sourceContext) throws Exception {
gzipStream = new GZIPInputStream(new FileInputStream(dataFilePath));
reader = new BufferedReader(new InputStreamReader(gzipStream, StandardCharsets.UTF_8));
String line;
TaxiRide taxiRide;
while (reader.ready() && (line = reader.readLine()) != null) {
taxiRide = TaxiRide.fromString(line);
sourceContext.collectWithTimestamp(taxiRide, getEventTime(taxiRide));
// sleep in nanoseconds to have a reproducible data rate for the data source
this.dataRateListener.busySleep();
}
}
public void busySleep() {
final long startTime = System.nanoTime();
while ((System.nanoTime() - startTime) < this.delayInNanoSeconds) ;
}
So, when I wait for 10000 nanoseconds in delayInNanoSeconds
variable I will get a workload of 100K
rec/sec (1_000_000_000 / 10_000 = 100K r/s). When I wait for 2000 nanoseconds in delayInNanoSeconds
variable I will get a workload of
500K rec/sec (1_000_000_000 / 2_000 = 500K r/s). For 1000 nanoseconds I will get a workload of 1M
rec/sec (1_000_000_000 / 1000 = 1M r/s). And for 500 nanoseconds a workload of 2M rec/sec (1_000_000_000 / 500 = 2M r/s).
I saw here that it could be better to use double
instead of long
to increase the precision. Is it somehow related? Or the problem is just an OS limitation (I am using Linux Ubuntu 18)? Or maybe it I because I am using the readLine()
method and there is a faster way to emit these events? I think that when I am using the GZIPInputStream
class I am loading the whole file in memory and the readLine()
does not access the disk anymore. How can I increase the data rate of my application?