0

Simple question, which I've been wondering. Of the following two versions of code, which is better optimized? Assume that the time value resulting from the System.currentTimeMillis() call only needs to be pretty accurate, so caching should only be considered from a performance point of view.

This (with value caching):

    long time = System.currentTimeMillis();
    for (long timestamp : times) {
        if (time - timestamp > 600000L) {
            // Do something
        }
    }

Or this (no caching):

    for (long timestamp : times) {
        if (System.currentTimeMillis() - timestamp > 600000L) {
            // Do something
        }
    }

I'm assuming System.currentTimeMillis() is already a very optimized and lightweight method call, but let's assume I'll be calling it many, many times in a short period.

How many values must the "times" collection/array contain to justify caching the return value of System.currentTimeMillis() in its own variable?

Is this better to do from a CPU or memory optimization point of view?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Kael Eppcohen
  • 177
  • 1
  • 2
  • 10
  • Since the time is changing they potentially do different things. The value could quite easilly change by 1 even in a simple case. You've missed caching `TimeUnit.MINUTES.toMillis(10)` – John3136 May 21 '18 at 22:56
  • @John3136 I forgot to mention that the value of the time variable doesn't need to be super accurate, so caching won't hurt the accuracy of the problem. – Kael Eppcohen May 21 '18 at 23:26
  • There will be a gain if you are concurrently thrashing on the lookup, but prefer readability otherwise. Typically it is `nanoTime` that is the bottleneck, on some OSes, due to not being well optimized by the kernel. On Linux there isn't a difference. If you are writing high performance software then use JMH, else go for maintainability. – Ben Manes May 22 '18 at 01:33

2 Answers2

3

A long is basically free. A JVM with a JIT compiler can keep it in a register, and since it's a loop invariant can even optimize your loop condition to -timestamp < 600000L - time or timestamp > time - 600000L. i.e. the loop condition becomes a trivial compare between the iterator and a loop-invariant constant in a register.

So yes it's obviously more efficient to hoist a function call out of a loop and keep the result in a variable, especially when the optimizer can't do that for you, and especially when the result is a primitive type, not an Object.


Assuming your code is running on a JVM that JITs x86 machine code, System.currentTimeMillis() will probably include at least an rdtsc instruction and some scaling of that result1. So the cheapest it can possibly be (on Skylake for example) is a micro-coded 20-uop instruction with a throughput of one per 25 clock cycles (http://agner.org/optimize/).

If your // Do something is simple, like just a few memory accesses that usually hit in cache, or some simpler calculation, or anything else that out-of-order execution can do a good job with, that could be most of the cost of your loop. Unless each loop iterations typically takes multiple microseconds (i.e. time for thousands of instructions on a 4GHz superscalar CPU), hoisting System.currentTimeMillis() out of the loop can probably make a measurable difference. Small vs. huge will depend on how simple your loop body is.

If you can prove that hoisting it out of your loop won't cause correctness problems, then go for it.

Even with it inside your loop, your thread could still sleep for an unbounded length of time between calling it and doing the work for that iteration. But hoisting it out of the loop makes it more likely that you could actually observe this kind of effect in practice; running more iterations "too late".


Footnote 1: On modern x86, the time-stamp counter runs at a fixed rate, so it's useful as a low-overhead timesource, and less useful for cycle-accurate micro-benchmarking. (Use performance counters for that, or disable turbo / power saving so core clock = reference clock.)

IDK if a JVM would actually go to the trouble of implementing its own time function, though. It might just use an OS-provided time function. On Linux, gettimeofday and clock_gettime are implemented in user-space (with code + scale factor data exported by the kernel into user-space memory, in the VDSO region). So glibc's wrapper just calls that, instead of making an actual syscall.

So clock_gettime can be very cheap compared to an actual system call that switches to kernel mode and back. That can take at least 1800 clock cycles on Skylake, on a kernel with Spectre + Meltdown mitigation enabled.

So yes, it's hopefully safe to assume System.currentTimeMillis() is "very optimized and lightweight", but even rdtsc itself is expensive compared to some loop bodies.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • 1
    `System.currentTimeMillis()` is comparable in cost to the user-space `clock_gettime` calls (and in fact may use them directly) and usually costs on the order of 20-30 ns on a good implementation. In the past and perhaps on some platforms even today it was dramatically slower, 1000ns or more. – BeeOnRope May 22 '18 at 01:43
  • Couldn't have asked for a better response. I was never sure if I was doing it right, glad I asked for clarification. – Kael Eppcohen May 22 '18 at 01:45
  • 1
    I think saving the results of function calls is really a no-brainer. Not just for performance (the difference could be _huge_ here), but also help reasoning about your code. Some people seem to imply repeatedly calling the function is more readable, but I don't really agree: here you are trying to compare many values in `time` to the current time, so it entirely makes sense to freeze the current time by copying it locally - it's just much easier to reason about what's going on, rather than trying to determine what happens if the function return value changes. – BeeOnRope May 22 '18 at 02:51
  • I don't thing `currentTimeMillis` (unlike `nanoTime`) uses `rdtsc` at all. Somewhere I read that it only reads a variable which gets periodically incremented. – maaartinus Jun 18 '18 at 13:36
  • @maaartinus: Was the thing you read talking about a *kernel* variable incremented by timer interrupts? That's another way to implement time, and used to be common before low-overhead `rdtsc` existed and was viable as a timesource. But the precision is only as good as timer-interrupt frequency, and you want that as infrequent as possible for lower CPU overhead. A JVM itself could use that mechanism in user-space, with a periodic signal handler or a thread that sleeps in a loop and updates a time variable inside the JVM. But that burns CPU unnecessarily unless `cTM()` is called a *lot*. – Peter Cordes Jun 18 '18 at 13:39
  • I can't recall the details. For `cTM` you need only once per millisecond, which shouldn't be a problem. `cTM` is typically used with every logging statement, which may indeed be a lot. I'm afraid, I won't find the article anymore... – maaartinus Jun 18 '18 at 14:22
  • 1
    @maaartinus: yeah, sounds like a good optimization for programs that spam log output at more than 1 per ms. If a JVM could adaptively figure out whether it was running a program that calls `cTM` that often or not, switching to a user-space time counter might be a good optimization. I'm not interested enough to actually trace / profile a JVM, but it would certainly be possible to figure out what it does from looking at system calls it makes / doesn't make. (Probably it would need a thread that sleeps, or a signal handler, to update the variable. Otherwise it couldn't know when it needed to.) – Peter Cordes Jun 18 '18 at 14:28
  • 2
    @maaartinus - as far as I've seen, on Linux, `cTM` is pretty much using the same implemenation as `nanoTime` - they generally have almost identical performance and other characteristics. – BeeOnRope Jun 18 '18 at 16:37
0

In your case, method calls should always be hoisted out of loops.


System.currentTimeMillis() simply reads a value from OS memory, so it is very cheap (a few nanoseconds), as opposed to System.nanoTime(), which involves a call to hardware, and therefore can be orders of magnitude slower.

spongebob
  • 8,370
  • 15
  • 50
  • 83