20

So my question is about variable accessing speed in Java. Today in my "CS" (if you can call it that) the teacher presented a similar example to the following of a List:

public class ListExample<T> {
    private Node<T> head;
    private Node<T> tail;

    private class Node<T> { /* ... */ }

    public void append(T content) {
        if (!isEmpty()) {
            Node<T> dummy = new Node<T>(content);
            head = dummy;
            tail = dummy;

            head.setNext(head);
            // or this
            dummy.setNext(dummy);

        } else { /* ... */ }
    }

    // more methods
    // ...
}

My question is: Would the call to head.setNext(head) be slower than dummy.setNext(dummy) ? Even if it's not noticeable. I was wondering this since head is obviously and instance var of the class and dummy is local, so would the local access be faster?

Joseph Adams
  • 972
  • 1
  • 6
  • 19
  • 1
    Since `dummy` would be on the heap along with `head` I would say their access time is the same. You are still having to access the heap in both scenarios. – C.B. Feb 06 '14 at 20:17
  • 3
    How could you say that X is slower than Y if the difference is "not noticeable" (implying not measurable)? Regardless, when in doubt: [**measure it.**](http://stackoverflow.com/q/504103/139010) – Matt Ball Feb 06 '14 at 20:17
  • For they are pointing to the same object, I;d say that there will be no difference – Svetlin Zarev Feb 06 '14 at 20:18
  • 4
    [Premature optimization is the root of all evil.](http://www.youtube.com/watch?v=4bQOSRm9YiQ) – Laf Feb 06 '14 at 20:19
  • @MattBall with that I really meant if there was any difference in the number of instructions that would need to be executed to get the value of either `head` or `dummy` (not JVM instructions since those will obvoiusly differ [`aload` and something]) – Joseph Adams Feb 06 '14 at 20:21
  • 2
    For head it will have to dereference the `this` pointer but for dummy it takes the value off the stack. – DrYap Feb 06 '14 at 20:23
  • In the interpreter the fastest access is to a local, next is an instance variable, slowest is a static. With the JITC, however, that can dance around a bit. – Hot Licks Feb 06 '14 at 20:39
  • @C.B. - "dummy" would be on the JVM stack, not in the heap. DrYap has it right. – Hot Licks Feb 06 '14 at 20:42
  • @HotLicks in regards to the objects or their references? – C.B. Feb 06 '14 at 20:44
  • @C.B. - Objects are never stored in an instance variable or a local variable, only references are stored there. – Hot Licks Feb 06 '14 at 20:57
  • @HotLicks i will be removing these comments as they detract from the question, but I was saying once you have the references for each you must access the heap. I thought retrieving the reference for `head` would be negligible – C.B. Feb 06 '14 at 20:59
  • @C.B. - Yeah, this is arguing over angels on a pinhead. In a well-designed JVM neither is expensive enough to worry about (though in some early JVMs there was cause for worry). – Hot Licks Feb 06 '14 at 21:06
  • @C.B. - But it should be noted that the same discussion applies to accessing a scalar value rather than a reference. – Hot Licks Feb 06 '14 at 21:30
  • I don't see how this question is "opinion-based". I lead the development of the Java implementation for IBM iSeries. My answer and comments here are based on facts. – Hot Licks Jan 09 '20 at 02:41

7 Answers7

23

Ok, I've written a micro-benchmark (as suggested by @Joni & @MattBall) and here are the results for 1 x 1000000000 accesses for each a local and an instance variable:

Average time for instance variable access: 5.08E-4
Average time for local variable access: 4.96E-4

For 10 x 1000000000 accesses each:

Average time for instance variable access:4.723E-4
Average time for local variable access:4.631E-4

For 100 x 1000000000 accesses each:

Average time for instance variable access: 5.050300000000002E-4
Average time for local variable access: 5.002400000000001E-4

So it seems that local variable accesses are indeed faster that instance var accesses (even if both point to the same object).

Note: I didn't want to find this out, because of something I wanted to optimize, it was just pure interest.

P.S. Here is the code for the micro-benchmark:

public class AccessBenchmark {
    private final long N = 1000000000;
    private static final int M = 1;

    private LocalClass instanceVar;

    private class LocalClass {
        public void someFunc() {}
    }

    public double testInstanceVar() {
        // System.out.println("Running instance variable benchmark:");
        instanceVar = new LocalClass();

        long start = System.currentTimeMillis();
        for (int i = 0; i < N; i++) {
            instanceVar.someFunc();
        }

        long elapsed = System.currentTimeMillis() - start;

        double avg = (elapsed * 1000.0) / N;

        // System.out.println("elapsed time = " + elapsed + "ms");
        // System.out.println(avg + " microseconds per execution");

        return avg;
    }

    public double testLocalVar() {
        // System.out.println("Running local variable benchmark:");
        instanceVar = new LocalClass();
        LocalClass localVar = instanceVar;

        long start = System.currentTimeMillis();
        for (int i = 0 ; i < N; i++) {
            localVar.someFunc();
        }

        long elapsed = System.currentTimeMillis() - start;

        double avg = (elapsed * 1000.0) / N;

        // System.out.println("elapsed time = " + elapsed + "ms");
        // System.out.println(avg + " microseconds per execution");

        return avg;
    }

    public static void main(String[] args) {
        AccessBenchmark bench;

        double[] avgInstance = new double[M];
        double[] avgLocal = new double[M];

        for (int i = 0; i < M; i++) {
            bench = new AccessBenchmark();

            avgInstance[i] = bench.testInstanceVar();
            avgLocal[i] = bench.testLocalVar();

            System.gc();
        }

        double sumInstance = 0.0;
        for (double d : avgInstance) sumInstance += d;
        System.out.println("Average time for instance variable access: " + sumInstance / M);

        double sumLocal = 0.0;
        for (double d : avgLocal) sumLocal += d;
        System.out.println("Average time for local variable access: " + sumLocal / M);
    }
}
Joseph Adams
  • 972
  • 1
  • 6
  • 19
  • 3
    It should be noted that, with differences this small, "trivial" things like cache alignment could throw off the results. – Hot Licks Feb 06 '14 at 21:29
  • @HotLicks: Sure, but this is something a benchmark frameworks should already handle (caliper allocates some random-length garbage) before measurements because of this, so I'd hope JMH does it too). In general, I rather agree, any tiny change can throw it off, I'm just surprised how consistent the results are. – maaartinus Feb 06 '14 at 22:14
  • I am surprised that the results are so consistent. I'm not surprised at the results -- they're about what I'd expect, given that the vast majority of the time is spent in the instance method call/return vs the variable reference. – Hot Licks Feb 06 '14 at 22:19
  • 17
    @Joseph: I think you draw totally wrong conclusion from those number. What the numbers actually indicate is this: **There is no significant difference in performance between the two alternatives.** The difference between your measurements is so tiny that it is well inside the margin of error of any benchmark. – Lii Sep 11 '15 at 08:36
16

In general, an access to an instance variable (of the this object) requires an aload_0 (to load this to the top of the stack) followed by getfield. Referencing a local variable requires only the aload_n to pull the value out of its assigned location in the stack.

Further, getfield must reference the class definition to determine where in the class (what offset) the value is stored. This could be several additional hardware instructions.

Even with a JITC it's unlikely that the local reference (which would normally be zero/one hardware operation) would ever be slower than the instance field reference (which would have to be at least one operation, maybe 2-3).

(Not that this matters all that much -- the speed of both is quite good, and the difference could only become significant in very bizarre circumstances.)

Hot Licks
  • 47,103
  • 17
  • 93
  • 151
10

Like in the comments, I don't think there's difference in the time taken. I think what you might be referring to is better exemplified in Java SE codebase. For example, in java.lang.String:

public void getBytes(int srcBegin, int srcEnd, byte dst[], int dstBegin) {
    //some code you can check out

    char[] val = value;
    while (i < n) {
        dst[j++] = (byte)val[i++];    /* avoid getfield opcode */
    }
}

In the above code, value is an instance variable and since there was a while loop where individual elements of value were going to be accessed, they brought it from the heap to the stack (local variable) thus optimizing.

You can also check out knowledge shared by Jon Skeet, Vivin and few others on this answer.

Community
  • 1
  • 1
aasu
  • 472
  • 4
  • 13
  • 2
    This is an interesting point, it suggest that there is in fact a difference. Though this String code may be old and there might be some optimizations that the JIT compiler does not that it didn't at the time of writing. – Joseph Adams Feb 06 '14 at 20:41
  • 2
    When setting a local just for this optimization, I wonder how many accesses makes it worthwhile (unlike the OP, who already had the local). – NateS May 07 '17 at 09:35
4

From a micro architecture perspective, reading a local variable may be cheaper because it's likely in a register or at least in the CPU cache. In general reading an instance variable may cause an expensive cache miss. In this case though the variable was just written, so it will likely be in the cache anyway. You could write a micro benchmark to find if there's any difference.

Joni
  • 108,737
  • 14
  • 143
  • 193
  • It's clear that a local variable will give you better cache coherence, but how about comparing local instance access vs static access? Would static fields and methods run a greater or lesser chance to not be in cache at any given moment? – scriptocalypse Jun 23 '15 at 19:05
-1

I think using dummy might be at the very most, 1 cycle faster, assuming it was left in a register, but it depends on the specific CPU architecture, and what setNext looks like, and the JVM you're using, and it's really unpredictable how the code might look once in its final JIT'd form. The JVM could potentially see that head == dummy, and if so, the executed code for both cases would be identical. This is much, much too tiny a case to worry about.

Boann
  • 48,794
  • 16
  • 117
  • 146
-1

I can assure you that whatever performance gains one might gain from this will be offset by the headache of looking at confusingly written code. Let the compiler figure this out. I will concede that all things being equal, the local variable is probably slightly faster, if only because there are fewer bytecode instructions involved. However, who is to say that future versions of the JVM won't change this?

In short, write code that is easy to read first. If, after that, you have a performance concern, profile.

HesNotTheStig
  • 549
  • 1
  • 4
  • 9
-2

When in doubt look at the byte code generated

public void append(java.lang.Object);
 Code:
  0:    new #2; //class ListExample$Node
  3:    dup
  4:    aload_0
  5:    aload_1
  6:    invokespecial   #3; //Method ListExample$Node."<init>":(LListExample;Ljava/lang/Object;)V
  9:    astore_2
  10:   aload_0
  11:   aload_2
  12:   putfield    #4; //Field head:LListExample$Node;
  15:   aload_0
  16:   aload_2
  17:   putfield    #5; //Field tail:LListExample$Node;
  20:   aload_0
  21:   getfield    #4; //Field head:LListExample$Node;
  24:   aload_0
  25:   getfield    #4; //Field head:LListExample$Node;
  28:   invokevirtual   #6; //Method ListExample$Node.setNext:(LListExample$Node;)V
  31:   aload_2
  32:   aload_2
  33:   invokevirtual   #6; //Method ListExample$Node.setNext:(LListExample$Node;)V
  36:   return

}

Either you get aload followed by getfield or 2 x aload. Seems to me they would be identical..

Niels Bech Nielsen
  • 4,777
  • 1
  • 21
  • 44
  • Thanks for your answer, have you got any idea why you would need 2x `aload_2`? Shouldn't 1 be enough to get the ref to `dummy` ? – Joseph Adams Feb 06 '14 at 20:32
  • 4
    @JosephAdams The compiler does almost no optimisation. All the optimisation is done by the JIT at runtime so the actual native code generated might look completely different. – Peter Lawrey Feb 06 '14 at 20:33
  • 7
    This tells you absolutely nothing about the runtime speed. – Boann Feb 06 '14 at 20:37
  • Just because they're the same number of bytecodes you think they would be the same speed?? – Hot Licks Feb 06 '14 at 20:40
  • @HotLicks No, but generally if there are less bytecodes (and they do [about] the same thing) I'd say it's more probable that it runs faster. – Joseph Adams Feb 06 '14 at 20:42
  • @JosephAdams: I guess, there's some correlation, but you can never be sure. [Recently](http://stackoverflow.com/questions/21567248/strange-performance-drop-of-jdk8-localdate-toepochday), I wondered how it comes that an algorithm using a couple of division can be relatively fast, but now I see there are no division after JIT. – maaartinus Feb 06 '14 at 22:18