15

I've been looking at at some of the java primitive collections (trove, fastutil, hppc) and I've noticed a pattern that class variables are sometimes declared as final local variables. For example:

public void forEach(IntIntProcedure p) {
    final boolean[] used = this.used;
    final int[] key = this.key;
    final int[] value = this.value;
    for (int i = 0; i < used.length; i++) {
        if (used[i]) {
          p.apply(key[i],value[i]);
        }
    }
}

I've done some benchmarking, and it appears that it is slightly faster when doing this, but why is this the case? I'm trying to understand what Java would do differently if the first three lines of the function were commented out.

Note: This seems similiar to this question, but that was for c++ and doesn't address why they are declared final.

Community
  • 1
  • 1
job
  • 9,003
  • 7
  • 41
  • 50

5 Answers5

29

Accessing local variable or parameter is a single step operation: take a variable located at offset N on the stack. If you function has 2 arguments (simplified):

  • N = 0 - this
  • N = 1 - first argument
  • N = 2 - second argument
  • N = 3 - first local variable
  • N = 4 - second local variable
  • ...

So when you access local variable, you have one memory access at fixed offset (N is known at compilation time). This is the bytecode for accessing first method argument (int):

iload 1  //N = 1

However when you access field, you are actually performing an extra step. First you are reading "local variable" this just to determine the current object address. Then you are loading a field (getfield) which has a fixed offset from this. So you perform two memory operations instead of one (or one extra). Bytecode:

aload 0  //N = 0: this reference
getfield total I  //int total

So technically accessing local variables and parameters is faster than object fields. In practice, many other factors may affect performance (including various levels of CPU cache and JVM optimizations).

final is a different story. It is basically a hint for the compiler/JIT that this reference won't change so it can make some heavier optimizations. But this is much harder to track down, as a rule of thumb use final whenever possible.

Tomasz Nurkiewicz
  • 334,321
  • 69
  • 703
  • 674
  • 6
    I suppose this answer (and particularly its last paragraph) is better than the marked one. – John Doe Sep 29 '12 at 11:34
  • I have to wonder if some of the speedup in final could be that a smart JIT could know to reuse a pointer before the object goes out of scope, and save on alloc(), plus get better cache hits from having a slightly smaller memory footprint... – Ajax Feb 09 '13 at 10:54
  • Totally agree. Most useful answer. – omniyo Jul 30 '14 at 12:42
7

The final keyword is a red herring here. The performance difference comes because they are saying two different things.

public void forEach(IntIntProcedure p) {
  final boolean[] used = this.used;
  for (int i = 0; i < used.length; i++) {
    ...
  }
}

is saying, "fetch a boolean array, and for each element of that array do something."

Without final boolean[] used, the function is saying "while the index is less than the length of the current value of the used field of the current object, fetch the current value of the used field of the current object and do something with the element at index i."

The JIT might have a much easier time proving loop bound invariants to eliminate excess bound checks and so on because it can much more easily determine what would cause the value of used to change. Even ignoring multiple threads, if p.apply could change the value of used then the JIT can't eliminate bounds checks or do other useful optimizations.

Mike Samuel
  • 118,113
  • 30
  • 216
  • 245
  • I'm confused as to what you mean by `final` is a red herring. You mean it's not necessarily faster to access the variable, but the JIT compiler can optimize the loop to eliminate range checks and lookups? – job Jul 06 '11 at 21:26
  • "Even ignoring multiple threads" - just to make this clear: The JIT **only** considers thread local behavior. This means even if used is public (or there's a setter method) and may be changed by another thread the JIT has every right to ignore this. So the JIT really only has to figure out if apply() will change the reference or not (in practice: If it can inline the call (and all subcalls) it'll notice it, otherwise you're out of luck most certainly) – Voo Jul 06 '11 at 21:26
  • Also there's a good chance that the "faster" behavior comes because someone wrote once again an invalid java benchmark (too easy to do that and way too hard to get it right) - there should be a performance difference in the interpreter but if apply is quite simple, there really shouldn't be any difference in compiled code with modern Hotspot – Voo Jul 06 '11 at 21:29
  • @job, I meant that the JIT compiler is better at local reasoning than global reasoning, so that the field is `final` is not as important as that it is local. I am questioning the premise that "final local variables" are faster and saying that uses of "local variables" are easier to optimize than uses of class instance members. – Mike Samuel Jul 06 '11 at 21:44
  • @Voo, is that true even if `p.apply` synchronizes? – Mike Samuel Jul 06 '11 at 21:45
  • @Mike If apply is a synchronized method this means we get a happens-before relationship for the instance of this method (ie the p) with any other synchronized method of the instance. So synchronized wouldn't help at all (if p==this it gets a bit more complicated though, if the variable was set in another synchronized method then yes you'll see the changes, otherwise no). The simplest approach to guarantee that threads never read stale values would be to make the variables volatile (which I now see I totally forgot to mention in the first comment, my bad). – Voo Jul 07 '11 at 00:55
  • @Voo, Besides just doing mutex like exclusion, I thought `synchronized` blocks also caused changes to an object in one thread to become apparent in other threads. And I didn't say anything about synchronized method. If `p.apply` does `synchronized (foo) { ... }` where `foo` is `this` in the caller of `p.apply` then the thread calling `p.apply` should see changes to `foo` made by other threads. – Mike Samuel Jul 07 '11 at 05:29
1

it tells the runtime (jit) that in the context of that method call, those 3 values will never change, so the runtime does not need to continually load the values from the member variable. this may give a slight speed improvement.

of course, as the jit gets smarter and can figure out these things on its own, these conventions become less useful.

note, i didn't make it clear that the speedup is more from using a local variable than the final part.

jtahlborn
  • 52,909
  • 5
  • 76
  • 118
  • Hey, I was typing this too! :-) Except I think even the compiler can benefit from knowing the method is not interested in parallel changes in these references. – Szocske Jul 06 '11 at 21:12
1

In the generated VM opcodes local variables are entries on the operand stack while field references must be moved to the stack via an instruction that retrieves the value through the object reference. I imagine the JIT can make the stack references register references more easily.

antlersoft
  • 14,636
  • 4
  • 35
  • 55
  • 3
    Not quite right. Local variables are placed on the thread *stack*, not on *operand stack*. various `load`/`store` opcodes are used to move local variables from stack to operand stack and back. See [this image](http://www.ibm.com/developerworks/ibm/library/it-haggar_bytecode/fig01.gif). – Tomasz Nurkiewicz Jul 06 '11 at 21:22
-1

Such simple optimizations are already included in JVM runtime. If JVM does naive access to instance variables, our Java applications will be turtle slow.

Such manual tuning probably worthwhile for simpler JVMs though, e.g. Android.

irreputable
  • 44,725
  • 9
  • 65
  • 93
  • dex (android) bytecode is probably even more effecient... an uncompressed .dx is smaller than a jar-compressed .class, and the whole reason for dalvik over java mobile was performance (standard jvm is too bloaty for mobile devices) – Ajax Feb 09 '13 at 10:58