How LongAdder performs better than AtomicLong

Question

I see how Java's AtomicInteger works internally with CAS (Compare And Swap) operation. Basically when multiple threads try to update the value, JVM internally use the underlying CAS mechanism and try to update the value. If the update fails, then try again with the new value but never blocks.

In Java8 Oracle introduced a new Class LongAdder which seems to perform better than AtomicInteger under high contention. Some blog posts claim that LongAdder perform better by maintaining internal cells - does that mean LongAdder aggregates the values internally and update it later? Could you please help me to understand how LongAdder works?

http://stackoverflow.com/questions/27628538/multithreading-summing-large-number-of-values-atomically/27628873#27628873 — sol4me, Jun 07 '15 at 07:25
At first I think I misunderstood you. Reading your question again, I think you got the idea right. — aioobe, Jun 07 '15 at 07:30
Thanks ! I'm more interested in understanding how these internal cells are organized? say if 100 threads are trying to update the value, how many internal cells are created and how they are updated? — Sathish, Jun 07 '15 at 07:49
For such questions regarding implementation details, I would suggest you refer to the source. It's actually quite easy to read and understand. See my updated answer for a link to the latest revision. — aioobe, Jun 07 '15 at 11:19

aioobe · Accepted Answer · 2015-06-07T11:18:16.727

does that mean LongAdder aggregates the values internally and update it later?

Yes, if I understand your statement correctly.

Each Cell in a LongAdder is a variant of an AtomicLong. Having multiple such cells is a way of spreading out the contention and thus increasing throughput.

When the final result (sum) is to be retrieved, it just adds together the values of each cell.

Much of the logic around how the cells are organized, how they are allocated etc can be seen in the source: http://hg.openjdk.java.net/jdk9/jdk9/jdk/file/f398670f3da7/src/java.base/share/classes/java/util/concurrent/atomic/Striped64.java

In particular the number of cells is bound by the number of CPUs:

/** Number of CPUS, to place bound on table size */
static final int NCPU = Runtime.getRuntime().availableProcessors();

score 8 · Answer 2 · edited Mar 25 '19 at 10:22

The primary reason it is "faster" is its contended performance. This is important because:

Under low update contention, the two classes have similar characteristics.

You'd use a LongAdder for very frequent updates, in which atomic CAS and native calls to Unsafe would cause contention. (See source and volatile reads). Not to mention cache misses/false sharing on multiple AtomicLongs (although I have not looked at the class layout yet, there doesn't appear to be sufficient memory padding before the actual long field.

under high contention, expected throughput of this class is significantly higher, at the expense of higher space consumption.

The implementation extends Striped64, which is a data holder for 64-bit values. The values are held in cells, which are padded (or striped), hence the name. Each operation made upon the LongAdder will modify the collection of values present in the Striped64. When contention occurs, a new cell is created and modified, so the the old thread can finish concurrently with contending one. When you need the final value, the sums of each cell is simply added up.

Unfortunately, performance comes with a cost, which in this case is memory (as often is). The Striped64 can grow very large if a large load of threads and updates are being thrown at it.

Quote source: Javadoc for LongAdder

score 2 · Answer 3 · answered Apr 25 '19 at 03:30

Atomic Long uses CAS which - under heavy contention can lead to many wasted CPU cycles. LongAdder, on the other hand, uses a very clever trick to reduce contention between threads, when these are incrementing it. So when we call increment() , behind the scenes LongAdder maintains an array of counter that can grow on demand. And so, when more threads are calling increment(), the array will be longer. Each record in the array can be updated separately – reducing the contention. Due to that fact, the LongAdder is a very efficient way to increment a counter from multiple threads. The result of the counter in the LongAdder is not available until we call the sum() method.

How LongAdder performs better than AtomicLong

3 Answers3

Linked

Related