LongAdder
is definitely not suitable for your use case of unique integer generation, but you don't need to understand the implementation or dig into the intricacies of the java memory model to determine that. Just look at the API: it has no compound "increment and get" type methods that would allow you to increment the value and get the old/new value back, atomically.
In terms of adding values, it only offers void add(long x)
and void increment()
methods, but these don't return a value. You mention:
the incrementAndGet in LongAdder is not atomic
... but I don't see incrementAndGet
at all in LongAdder
. Where are you looking?
Your idea of:
- usage in a thread will consist of a w rite with an immediate read afterwards
- Requirement is that each read
from the counter (immediately after the writing) gets a unique
incremented value. It is not required that each retrieved counter
value is increasing in the same order as the different
threads(writers) increment the value.
Doesn't work even for AtomicLong
, unless by "write followed by a read" you mean calling the incrementAndGet
method. I think it goes without saying that two separate calls on an AtomicLong
or LongAdder
(or any other object really) can never be atomic without some external locking.
So the Java doc, in my opinion, is a bit confusing. Yes, you should not use sum()
for synchronization control, and yes "concurrent updates that occur while the sum is being calculated might not be incorporated"; however, the same is true of AtomicLong
and its get()
method. Increments that occur while calling get()
similarly may or may not be reflected in the value returned by get()
.
Now there are some guarantees that are weaker with LongAdder
compared to AtomicLong
. One guarantee you get with AtomicLong
is that a series of operations transition the object though a specific series of values, and where there is no guarantee on what specific value a thread will see, all the values should come from the true set of transition values.
For example, consider starting with an AtomicLong
with value zero, and two threads incrementing it concurrently, by 1 and 3 respetively. The final value will always be 4, and only two possible transition paths are possible: 0 -> 1 -> 4
or 0 -> 3 -> 4
. For a given execution, only one of those can have occurred and all concurrent reads will be consistent with that execution. That is, if any thread reads a 1
, then no thread may read a 3
and vice versa (of course, there is no guarantee that any thread will see a 1
or 3
at all, they may all see 0
or 4
.
LongCounter
doesn't provide that guarantee. Since the write process is not locked, and the read process adds together several values in a not-atomic fashion, it is possible for one thread to see a 1
and another to see a 3
in the same execution. Of course, it still doesn't synthesize "fake" values - you should never read a "2" for example.
Now that's a bit of a subtle concept and the Javadoc doesn't get it across well. They go with a pretty weak and not particularly formal statement instead. Finally, I don't think you can observe the behavior above with pure increments (rather than additions) since there is only one path then: 0 -> 1 -> 2 -> 3
, etc. So for increments, I think AtomicLong.get()
and LongCounter.sum()
have pretty much the same guarantees.
Something Useful
OK, so I'll give you something that might be useful. You can still implement what you want for efficiently, as long as you don't have strict requirements on the exact relationship between the counter value each thread gets and the order they were read.
Re-purpose the LongAdder Idea
You could make the LongAdder
idea work fine for unique counter generation. The underlying idea of LongAdder
is to spread the counter into N distinct counters (which live on separate cache lines). Any given call updates one of those counters based on the current thread ID2, and a read needs to sum the values from all counters. This means that writes have low contention, at the cost of a bit more complexity, and at a large cost to reads.
Now way the write works by design doesn't let you read the full LongAdder
value, but since you just want a unique value you could use the same code except with the top or bottom N bits3 set uniquely per counter.
Now the write can return the prior value, like getAndIncrement
and it will be unique because the fixed bits keep it unique among all counters in that object.
Thread-local Counters
A very fast and simple way is to use a unique value per thread, and a thread-local counter. When the thread local is initialized, it gets a unique ID from a shared counter (only once per thread), and then you combine that ID with a thread-local counter - for example, the bottom 24-bits for the ID, and the top 40-bits for the local counter1. This should be very fast, and more importantly essentially zero contention.
The downside is that the values of the counters won't have any specific relationship among threads (although they may still be strictly increasing within a thread). For example, a thread which has recently requested a counter value may get a much smaller one than a long existing value. You haven't described how you'll use these so I don't know if it is a problem.
Also, you don't have a single place to read the "total" number of counters allocated - you have to examine all the local counters to do that. This is doable if your application requires it (and has some of the same caveats as the LongAdder.sum()
function).
A different solution, if you want the numbers to be "generally increasing with time" across threads, and know that every thread requests counter values reasonably frequently, is to use a single global counter, which threads request a local "allocation" of a number of IDs, from which it will then allocate individual IDs in a thread-local manner. For example, threads may request 10 IDs, so that three threads will be allocated the range 0-9, 10-19, and 20-29, etc. They then allocate out of that range until it is exhausted and which point they go back to the global counter. This is similar to how memory allocators carve out chunks of a common pool which can then be allocated thread-local.
The example above will keep the IDs roughly in increasing order over time, and each threads IDs will be strictly increasing as well. It doesn't offer any strict guarantees though: a thread that is allocated the range 0-9, could very well sleep for hours after using 0, and then use "1" when the counters on other threads are much higher. It would reduce contention by a factor of 10.
There are a variety of other approaches you could use and mostof them trade-off contention reduction versus the "accuracy" of the counter assignment versus real time. If you had access to the hardware, you could probably use a quickly incrementing clock like the cycle counter (e.g., rdtscp
) and the core ID to get a unique value that is very closely tied to realtime (assuming the OS is synchronizing the counters).
1 The bit-field sizes should be chosen carefully based on the expected number of threads and per-thread increments in your application. In general, if you are constantly creating new threads and your application is long-lived, you may want to err on the side of more bits to the thread ID, since you can always detect a wrap of the local counter and get a new thread ID, so bits allocated to the thread ID can be efficiently shared with the local counters (but not the other way around).
2 The optimal is to use the 'CPU ID', but that's not directly accessible in Java (and even at the assembly level there is no fast and portable way to get it, AFAIK) - so the thread ID is used as a proxy.
3 Where N is lg2(number of counters)
.