AtomicInteger implementation and code duplication

Question

Warning: the question is a little long, but the part below the separation line is for curiosity only.

Oracle's JDK 7 implementation of AtomicInteger includes the following methods:

public final int addAndGet(int delta) {
    for (;;) {
        int current = get();
        int next = current + delta;         // Only difference
        if (compareAndSet(current, next))
            return next;
    }
}

public final int incrementAndGet() {
    for (;;) {
        int current = get();
        int next = current + 1;             // Only difference
        if (compareAndSet(current, next))
            return next;
    }
}

It seems clear that the second method could have been written:

public final int incrementAndGet() {
    return addAndGet(1);
}

There are several other examples of similar code duplication in that class. I can't think of any reasons to do that but performance considerations (*). And I am pretty sure the authors did some in-depth testing before settling on that design.

Why (or in what circumstances) would the first code perform better than the second?

(*) I could not resist but write a quick micro benchmark. It shows (post-JIT) a systematic gap of 2-4% performance in favour of addAndGet(1) vs incrementAndGet() (that is admittedly small, but it is very consistent). I can't really explain that result either to be honest...

Output:

incrementAndGet(): 905
addAndGet(1): 868
incrementAndGet(): 902
addAndGet(1): 863
incrementAndGet(): 891
addAndGet(1): 867
...

Code:

public static void main(String[] args) throws Exception {
    final int size = 100_000_000;
    long start, end;
    AtomicInteger ai;

    System.out.println("JVM warmup");
    for (int j = 0; j < 10; j++) {
        start = System.nanoTime();
        ai = new AtomicInteger();
        for (int i = 0; i < size / 10; i++) {
            ai.addAndGet(1);
        }
        end = System.nanoTime();
        System.out.println("addAndGet(1): " + ((end - start) / 1_000_000));
        start = System.nanoTime();
        ai = new AtomicInteger();
        for (int i = 0; i < size / 10; i++) {
            ai.incrementAndGet();
        }
        end = System.nanoTime();
        System.out.println("incrementAndGet(): " + ((end - start) / 1_000_000));
    }


    System.out.println("\nStart measuring\n");

    for (int j = 0; j < 10; j++) {
        start = System.nanoTime();
        ai = new AtomicInteger();
        for (int i = 0; i < size; i++) {
            ai.incrementAndGet();
        }
        end = System.nanoTime();
        System.out.println("incrementAndGet(): " + ((end - start) / 1_000_000));
        start = System.nanoTime();
        ai = new AtomicInteger();
        for (int i = 0; i < size; i++) {
            ai.addAndGet(1);
        }
        end = System.nanoTime();
        System.out.println("addAndGet(1): " + ((end - start) / 1_000_000));
    }
}

Though I doubt this will actually reveal the answer, you could disassemble both functions and show the disassembled Java bytecode here. It *might* reveal something. — Alex D, Feb 28 '13 at 18:50

score 9 · Accepted Answer · edited May 23 '17 at 12:15

9

I'll give new supposition. If we look into byte code of AtomicInteger we will see, that the main difference between them is that addAndGet uses iload_ instruction, and incrementAndGet uses iconst_ instruction:

public final int addAndGet(int);
   ...
   4:   istore_2
   5:   iload_2
   6:   iload_1
   7:   iadd

public final int incrementAndGet();
   ...
   4:   istore_1
   5:   iload_1
   6:   iconst_1
   7:   iadd

It seems, that iconst_+iadd translates as INC instruction, due to iload_...iadd as ADD instruction. This all relates to commonly known question about ADD 1 vs INC and so on:

Relative performance of x86 inc vs. add instruction

Is ADD 1 really faster than INC ? x86

This could be the answer, why addAndGet is slightly faster than incrementAndGet

edited May 23 '17 at 12:15

Community

1
1

answered Feb 28 '13 at 19:23

Andremoniy

34,031
20
135
241

Your initial answer was [spot on](http://cs.oswego.edu/pipermail/concurrency-interest/2013-February/010893.html)! Except that it was not implemented in 7u11 yet apparently (which is the version I am using). – assylias Feb 28 '13 at 22:18
See also [the bug database](http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7023898). – assylias Feb 28 '13 at 22:19

score 5 · Answer 2 · answered Feb 28 '13 at 19:48

Out of curiosity, here is the assembly code generated by the JIT. In summary, the main difference is:

incrementAndGet

mov    r8d,eax
inc    r8d                ;*iadd

addAndGet

mov    r9d,r8d
add    r9d,eax            ;*iadd

The rest of the code is essentially the same. That confirms that:

the methods are not intrinsics and don't call each other under the hood
the only difference is INC vs ADD 1

I am not good enough at reading assembly to know why that makes a difference. And that does not really answer my initial question.

Full listing (incrementAndGet):

  # {method} 'incrementAndGet' '()I' in 'java/util/concurrent/atomic/AtomicInteger'
  #           [sp+0x20]  (sp of caller)
  0x00000000026804c0: mov    r10d,DWORD PTR [rdx+0x8]
  0x00000000026804c4: shl    r10,0x3
  0x00000000026804c8: cmp    rax,r10
  0x00000000026804cb: jne    0x0000000002657b60  ;   {runtime_call}
  0x00000000026804d1: data32 xchg ax,ax
  0x00000000026804d4: nop    DWORD PTR [rax+rax*1+0x0]
  0x00000000026804dc: data32 data32 xchg ax,ax
[Verified Entry Point]
  0x00000000026804e0: sub    rsp,0x18
  0x00000000026804e7: mov    QWORD PTR [rsp+0x10],rbp  ;*synchronization entry
                                                ; - java.util.concurrent.atomic.AtomicInteger::incrementAndGet@-1 (line 204)
  0x00000000026804ec: mov    eax,DWORD PTR [rdx+0xc]  ;*invokevirtual compareAndSwapInt
                                                ; - java.util.concurrent.atomic.AtomicInteger::compareAndSet@9 (line 135)
                                                ; - java.util.concurrent.atomic.AtomicInteger::incrementAndGet@12 (line 206)
  0x00000000026804ef: mov    r8d,eax
  0x00000000026804f2: inc    r8d                ;*iadd
                                                ; - java.util.concurrent.atomic.AtomicInteger::incrementAndGet@7 (line 205)
  0x00000000026804f5: lock cmpxchg DWORD PTR [rdx+0xc],r8d
  0x00000000026804fb: sete   r11b
  0x00000000026804ff: movzx  r11d,r11b          ;*invokevirtual compareAndSwapInt
                                                ; - java.util.concurrent.atomic.AtomicInteger::compareAndSet@9 (line 135)
                                                ; - java.util.concurrent.atomic.AtomicInteger::incrementAndGet@12 (line 206)
  0x0000000002680503: test   r11d,r11d
  0x0000000002680506: je     0x0000000002680520  ;*iload_2
                                                ; - java.util.concurrent.atomic.AtomicInteger::incrementAndGet@18 (line 207)
  0x0000000002680508: mov    eax,r8d
  0x000000000268050b: add    rsp,0x10
  0x000000000268050f: pop    rbp
  0x0000000002680510: test   DWORD PTR [rip+0xfffffffffdbafaea],eax        # 0x0000000000230000
                                                ;   {poll_return}
  0x0000000002680516: ret    
  0x0000000002680517: nop    WORD PTR [rax+rax*1+0x0]  ; OopMap{rdx=Oop off=96}
                                                ;*goto
                                                ; - java.util.concurrent.atomic.AtomicInteger::incrementAndGet@20 (line 208)
  0x0000000002680520: test   DWORD PTR [rip+0xfffffffffdbafada],eax        # 0x0000000000230000
                                                ;*goto
                                                ; - java.util.concurrent.atomic.AtomicInteger::incrementAndGet@20 (line 208)
                                                ;   {poll}
  0x0000000002680526: mov    r11d,DWORD PTR [rdx+0xc]  ;*invokevirtual compareAndSwapInt
                                                ; - java.util.concurrent.atomic.AtomicInteger::compareAndSet@9 (line 135)
                                                ; - java.util.concurrent.atomic.AtomicInteger::incrementAndGet@12 (line 206)
  0x000000000268052a: mov    r8d,r11d
  0x000000000268052d: inc    r8d                ;*iadd
                                                ; - java.util.concurrent.atomic.AtomicInteger::incrementAndGet@7 (line 205)
  0x0000000002680530: mov    eax,r11d
  0x0000000002680533: lock cmpxchg DWORD PTR [rdx+0xc],r8d
  0x0000000002680539: sete   r11b
  0x000000000268053d: movzx  r11d,r11b          ;*invokevirtual compareAndSwapInt
                                                ; - java.util.concurrent.atomic.AtomicInteger::compareAndSet@9 (line 135)
                                                ; - java.util.concurrent.atomic.AtomicInteger::incrementAndGet@12 (line 206)
  0x0000000002680541: test   r11d,r11d
  0x0000000002680544: je     0x0000000002680520  ;*ifeq
                                                ; - java.util.concurrent.atomic.AtomicInteger::incrementAndGet@15 (line 206)
  0x0000000002680546: jmp    0x0000000002680508

Full listing (addAndGet):

  # {method} 'addAndGet' '(I)I' in 'java/util/concurrent/atomic/AtomicInteger'
  # this:     rdx:rdx   = 'java/util/concurrent/atomic/AtomicInteger'
  # parm0:    r8        = int
  #           [sp+0x20]  (sp of caller)
  0x0000000002680d00: mov    r10d,DWORD PTR [rdx+0x8]
  0x0000000002680d04: shl    r10,0x3
  0x0000000002680d08: cmp    rax,r10
  0x0000000002680d0b: jne    0x0000000002657b60  ;   {runtime_call}
  0x0000000002680d11: data32 xchg ax,ax
  0x0000000002680d14: nop    DWORD PTR [rax+rax*1+0x0]
  0x0000000002680d1c: data32 data32 xchg ax,ax
[Verified Entry Point]
  0x0000000002680d20: sub    rsp,0x18
  0x0000000002680d27: mov    QWORD PTR [rsp+0x10],rbp  ;*synchronization entry
                                                ; - java.util.concurrent.atomic.AtomicInteger::addAndGet@-1 (line 233)
  0x0000000002680d2c: mov    eax,DWORD PTR [rdx+0xc]  ;*invokevirtual compareAndSwapInt
                                                ; - java.util.concurrent.atomic.AtomicInteger::compareAndSet@9 (line 135)
                                                ; - java.util.concurrent.atomic.AtomicInteger::addAndGet@12 (line 235)
  0x0000000002680d2f: mov    r9d,r8d
  0x0000000002680d32: add    r9d,eax            ;*iadd
                                                ; - java.util.concurrent.atomic.AtomicInteger::addAndGet@7 (line 234)
  0x0000000002680d35: lock cmpxchg DWORD PTR [rdx+0xc],r9d
  0x0000000002680d3b: sete   r11b
  0x0000000002680d3f: movzx  r11d,r11b          ;*invokevirtual compareAndSwapInt
                                                ; - java.util.concurrent.atomic.AtomicInteger::compareAndSet@9 (line 135)
                                                ; - java.util.concurrent.atomic.AtomicInteger::addAndGet@12 (line 235)
  0x0000000002680d43: test   r11d,r11d
  0x0000000002680d46: je     0x0000000002680d60  ;*iload_3
                                                ; - java.util.concurrent.atomic.AtomicInteger::addAndGet@18 (line 236)
  0x0000000002680d48: mov    eax,r9d
  0x0000000002680d4b: add    rsp,0x10
  0x0000000002680d4f: pop    rbp
  0x0000000002680d50: test   DWORD PTR [rip+0xfffffffffdbaf2aa],eax        # 0x0000000000230000
                                                ;   {poll_return}
  0x0000000002680d56: ret    
  0x0000000002680d57: nop    WORD PTR [rax+rax*1+0x0]  ; OopMap{rdx=Oop off=96}
                                                ;*goto
                                                ; - java.util.concurrent.atomic.AtomicInteger::addAndGet@20 (line 237)
  0x0000000002680d60: test   DWORD PTR [rip+0xfffffffffdbaf29a],eax        # 0x0000000000230000
                                                ;*goto
                                                ; - java.util.concurrent.atomic.AtomicInteger::addAndGet@20 (line 237)
                                                ;   {poll}
  0x0000000002680d66: mov    r11d,DWORD PTR [rdx+0xc]  ;*invokevirtual compareAndSwapInt
                                                ; - java.util.concurrent.atomic.AtomicInteger::compareAndSet@9 (line 135)
                                                ; - java.util.concurrent.atomic.AtomicInteger::addAndGet@12 (line 235)
  0x0000000002680d6a: mov    r9d,r11d
  0x0000000002680d6d: add    r9d,r8d            ;*iadd
                                                ; - java.util.concurrent.atomic.AtomicInteger::addAndGet@7 (line 234)
  0x0000000002680d70: mov    eax,r11d
  0x0000000002680d73: lock cmpxchg DWORD PTR [rdx+0xc],r9d
  0x0000000002680d79: sete   r11b
  0x0000000002680d7d: movzx  r11d,r11b          ;*invokevirtual compareAndSwapInt
                                                ; - java.util.concurrent.atomic.AtomicInteger::compareAndSet@9 (line 135)
                                                ; - java.util.concurrent.atomic.AtomicInteger::addAndGet@12 (line 235)
  0x0000000002680d81: test   r11d,r11d
  0x0000000002680d84: je     0x0000000002680d60  ;*ifeq
                                                ; - java.util.concurrent.atomic.AtomicInteger::addAndGet@15 (line 235)
  0x0000000002680d86: jmp    0x0000000002680d48

how to generate assembly from jit? – Sudip Bhandari May 31 '17 at 09:40 — Sudip Bhandari, May 31 '17 at 09:40

Andrew Mao · Answer 3 · 2013-02-28T18:46:04.513

1

To expand on @AlexeiKaigorodov's answer, if this is real Java code, it would be faster because it would eliminate an extra frame on the call stack. This makes it run quicker (why not?) and may have implications that multiple concurrent calls to the loop are less likely to fail, causing the loop to run repeatedly. (Though, I can't come up with any such reasons off the top of my head.)

Though, through your micro-benchmarks, it's possible that the code is not real and the incrementAndGet() method is implemented in native code in the way you specified, or that both are just intrinsic instructions (delegated to lock:xadd on x86 for example). However, it's generally rather hard to divine what the JVM is doing all the time, and there could be other things that are causing this.

edited Feb 28 '13 at 18:46

answered Feb 28 '13 at 18:27

Andrew Mao

35,740
23
143
224

Not sure I understand this part "*less likely that multiple concurrent calls compareAndSet() will fail and the loop will have to run again.*": `compareAndSet` is run on the same `this` object in any case... – assylias Feb 28 '13 at 18:28
`incrementAndGet` runs slower than `addAndGet`, so what elimination of an extra frame you are talking about? – Andremoniy Feb 28 '13 at 18:43
I'm just speculating that if it was all Java code, that would be the reason for the code duplication. My second paragraph suggests other possibilities... – Andrew Mao Feb 28 '13 at 18:46
@Eng.Fouad I'm not so sure - I would expect the JIT to inline such a short method (especially a final one). – assylias Feb 28 '13 at 18:53

Aleš · Answer 4 · 2013-03-01T03:56:28.417

0

Just to complete the discussion, the same question was also asked in the Concurrency-interest -- Discussion list for JSR-166 mailing list almost at the same time as here.

Here is the start of the thread - [concurrency-interest] AtomicInteger implementation discussing the AtomicInteger implementation.

edited Mar 01 '13 at 03:56

answered Mar 01 '13 at 01:20

Aleš

8,896
8
62
107

score -3 · Answer 5 · answered Feb 28 '13 at 18:24

-3

The reason is that they preferred to make code faster, in expense of code size.

I am sure, the sources are real. If they were intrinsics, they would be marked as native.

answered Feb 28 '13 at 18:24

Alexei Kaigorodov

13,189
1
21
38

I'm sure that's the case - the question is "why would that be faster?" – assylias Feb 28 '13 at 18:24
1

Alexei, you are wrong about "if they were intrinsics, they would be markerd as native" - it seems, that you do not understand what "intrinsics" are. – Andremoniy Feb 28 '13 at 18:26
@Andremoniy I don't really understand "intrinsics" either. could you add that in your answer? – irreputable Feb 28 '13 at 18:27
1

@irreputable, read this one: http://vanillajava.blogspot.ru/2012/11/java-intrinsics-and-performance.html – Andremoniy Feb 28 '13 at 18:30
@Andremoniy yes I don't understand what you mean, since there is now standard meaning of "intrinsics" in java – Alexei Kaigorodov Feb 28 '13 at 18:31
@Andremoniy I mean "standard", that is, described in the specification. Neither JLS nor JVMS contain such a notion. – Alexei Kaigorodov Feb 28 '13 at 18:40
@AlexeiKaigorodov I think Andremoniy has a point - see for example [this post](http://stackoverflow.com/questions/15085294/java-lang-math-log-replaced-by-intrinsic-call-why-not-java-lang-math-exp). – assylias Feb 28 '13 at 18:47
@assylias Andremoniy has 2 points: 1) intrinsics exist (which is true), and 2) intrinsics are standard notion (which is wrong). – Alexei Kaigorodov Mar 01 '13 at 03:58

AtomicInteger implementation and code duplication

5 Answers5

Linked