Which is "better". AtomicIntegerArray (1/0 as true/false) versus AtomicBoolean[]?

Question

I am very curious about that. If you use AtomicIntegerArray with values 0 and 1 you can accomplish the same thing of an AtomicBoolean array. Example:

final AtomicIntegerArray array1 = new AtomicIntegerArray(10);
array1.compareAndSet(3, 0, 1); // 0 = false and 1 = true

// exactly the same thing of:

final AtomicBoolean[] array2 = new AtomicBoolean[10];
for(int i = 0; i < array2.length; i++)
     array2[i] = new AtomicBoolean(false);
array2[3].compareAndSet(false, true);

Which one do you think is faster and better?

For the "faster" bit, opinions are useless and benchmarks are mandatory. — , Mar 06 '14 at 22:01
An int and a boolean are not the same to start with, so there is no "better one"; if you want boolean, use booleans... — fge, Mar 06 '14 at 22:04
AtomicIntegerArray has an internal int[] array. If you create an array of AtomicBooleans, you will probably be consuming more memory, but internally its "boolean" value is backed by a primitive, an int, so each value consumes the same memory. You really need to test this. I'm guessing that if you're talking about atomic transactions, the get/set speed of AtomicIntegerArray will be slightly slower since the int value is wrapped inside several other calls. AtomicBoolean uses a simple ?: to convert 1/0 to true/false. — mttdbrd, Mar 06 '14 at 22:15
It just occurred to me too that in testing you may find that sorting the arrays (if they're sufficiently large) influences the speed of processing. Don't fall victim to a failure of branch prediction and conclude that one method is better/faster than the other. See this thread for information on why sorted arrays can be processed faster: http://stackoverflow.com/questions/11227809/why-is-processing-a-sorted-array-faster-than-an-unsorted-array?rq=1 — mttdbrd, Mar 06 '14 at 22:50
@mttdbrd I don't see where branch prediction would make a difference in in the given code/scenario. CAS does use branches internally for obvious reasons, but whether the array is sorted or not won't influence the branch result there at all. — Voo, Mar 06 '14 at 23:45
@mttdbrd If there are no branches that would be influenced based on the value, sorting the array won't make a difference. The only conditional in there is the CAS loop which *doesn't* depend on the value, but only on whether there was contention or not. — Voo, Mar 07 '14 at 11:52
@Voo please read the thread I posted above. I'm not talking about conditionals inside the AtomicBoolean or AtomicIntegerArray themselves. If there are tests for conditionals in your own code with big arrays, a single if statement testing values in a loop will be significantly slower if it's not sorted. Again, please read the thread and the code sample in the question. — mttdbrd, Mar 07 '14 at 14:32
@mttdbrd I implemented a MIPS CPU with simple static branch prediction back in colleague, I think I get the basic idea :-) But I still don't see how this is applicable to the given problem (and sorting arrays doesn't necessarily help with branch prediction it all depends on the code). Sure *surrounding* code may be susceptible to effects of it, but then we could also tell him to avoid false sharing or that cache oblivious algorithms are a good idea. — Voo, Mar 07 '14 at 14:47
@Voo The point was very simple. If OP is going to test this in a production environment, be sure that branch prediction failure is NOT the cause of any slowness. If the arrays are large and there are lots of comparisons, then OP might think any slowness was the result of using the _classes themselves_ and not the code around it. The question here has to do with speed and so I was suggesting that OP make sure he provides the best possible conditions to test speed. And I think you still haven't read that thread or the answer. Sorting arrays DOES help with branch prediction. — mttdbrd, Mar 07 '14 at 15:01
@mttdbrd Both code versions would run under the same circumstance and would profit equally from branch prediction anyhow. Also no sorting an array is no guarantee to help with branch prediction - it all depends on the code (the code in the linked question can easily be optimized to use a cmov or hoist the condition out, I'm very surprised that gcc doesn't do that). Or another example: [This code](http://paste.pound-python.org/show/6zad7awWaSj6I7JNAlZQ/) here does depend on branch prediction, but sorting the array is completely incidental to any performance improvements (and can make it worse) — Voo, Mar 07 '14 at 15:11
Thought - may there be an issue with cache locality when the arrays are being accessed via threads running on different cores? In AtomicIntegerArray you get more values on a cache line, increasing the likeliness that two threads accessing different but 'close' entries will interfere. — selig, Mar 17 '14 at 11:30

Gray · Accepted Answer · 2018-08-06T14:55:34.683

12

Which one do you think is faster and better?

Interesting question. The speed of this would probably only be visible if you are doing some very large number of cycles. Otherwise worrying about it smacks as premature optimization. I would pick the pattern that is the cleanest and most easily maintained.

Under the covers, both methods use the Unsafe.compareAndSwapInt(...) so the performance may be very similar. Since there is no blocking with accessing of volatile storage, this is not about collisions. The AtomicBoolean array will certain have a larger number of objects associated with it – each with their own volatile storage. Also, under the covers the AtomicBoolean stores the boolean value as an int so no savings there.

My instinct tells me to use the AtomicIntegerArray. Less code for you to write which typically means more reliance on the JDK to do the right thing. To figure it out you would have to test some large number of iterations on your production architecture to know for sure. I suspect the difference is going to be negligible and hard to measure.

Not a great answer but hopefully something helpful here.

Edit:

So I just ran some tests and I can't see any significant differences. Here's my little test program. It used 100 threads and ran 10 million iterations and they were within 0-10% of each other. As @mttdbrd points out, this is in no way a "real life" test. Only benching this in production with the code actually functioning like it should before you truly know if there is a difference.

Edit:

Ok after tweaking my program to make sure I warmed up the hotspot compiler per @mttdbrd's document, and changing the program to be able to better tune the number of entries, I see some interesting results.

With 1000 elements in the arrays:

AtomicIntegerArray in 4224 millis
AtomicBoolean[]    in 3546 millis    (always a little bit faster)

However with 10 elements in the array:

AtomicIntegerArray in 26506 millis
AtomicBoolean[]    in 13263 millis  (much faster)

Notice also the speed difference in general. It makes sense since there is more thread contention. 100 threads are much more likely to have to spin with 10 elements instead of 1000.

What does this mean? That if you change from one to the other you might save yourself at most 1 nanosecond per operation. Might. So instead of worrying about the performance of the two, you should pick the pattern that is the cleanest and most easily maintained.

edited Aug 06 '18 at 14:55

answered Mar 06 '14 at 22:14

Gray

115,027
24
293
354

but I still think that if the array is heavily concurrently accessed the AtomicBoolen[] is the best solution. In normal condition I agree with you and AtomicIntegerArray is what I would use. – piacente.cristian Mar 06 '14 at 22:22
2

@Gray I like your enthusiasm for writing a program but unfortunately, micro benchmarks are basically useless: https://www.ibm.com/developerworks/java/library/j-jtp02225/ As I say above, this needs to be tested in the full production environment. – mttdbrd Mar 06 '14 at 22:38
Oh I know @mttdbrd. I'll add the caveats in my answer. – Gray Mar 06 '14 at 22:40
You also need to run it 10's of thousands of time in the production environment to allow the HotSpot compiler to figure out if it needs to optimize that specific code. In short, there is no quick and easy answer. – mttdbrd Mar 06 '14 at 22:46
1

So in reading your document @mttdbrd I actually think I've done a pretty good job of it. I've warmed up the hotspot with some significant cycles although I could have done a pass first. I'm running on a multiprocessor box, I'm not testing GC, I've added some logic which should not optimize away the updates. But certainly I'm testing a single JVM on a single architecture. – Gray Mar 06 '14 at 22:50
The big one to me is that looping may result in loop unfolding or even worse simply setting a value at the end. And the JVM knows whether or not the values are used. The article seems to imply that the JVM and/or HotSpot compiler can tell whether or not variables incremented in loops are even used. Since no calculations are done and the arrays are simply being populated, the JIT may just skip over that bit of code and report being finished (without ever having done anything!). – mttdbrd Mar 06 '14 at 22:56
Yeah but it's updating a `volatile` array that is in the class @mttdbrd. It better not optimize it away. But always something to worry about. I added the logic because I thought it might optimize away the same operation 1000s of times in a row. :-) – Gray Mar 06 '14 at 22:59
Apart from the use of `currentTimeMillis` Gray seems to take care of the basics, assuming he ran it with the usual flags to check for unusual GC activity during the benchmarks this should be accurate (please just use `nanoTime` the next time - it's at least as accurate as millis and monotonic contrary to mills!) – Voo Mar 06 '14 at 23:29
b2t: Assuming the usual architectures (not all ISAs allow atomic instructions on anything smaller word size and ll/sc works rather differently to CAS!) how much faster the boolean version is depends mostly on the contention. If you use an Int and 2 threads try to update at the same time only one (at best!) can succeed, while single booleans can be updated separately without interference. On the other hand we blow up the memory size by a factor of 8.# – Voo Mar 06 '14 at 23:36
@Gray not using jmh yet? You should try it! – assylias Mar 06 '14 at 23:49
Didn't know about jmh @assylias. THanks for the reference. Looks cool although I do this this pretty rarely. :-) – Gray Mar 07 '14 at 01:35
Understood @Voo. In these sort of timings I tend to take the opinion that if anything is going to be learned it is going to be a the millisecond or above level. If the test runs were only nanoseconds off then it wouldn't matter. I'm running these tests on my desktop with itunes, chrome, and other crap in the background so precision is never really in question. :-) – Gray Mar 07 '14 at 01:38
@Gray so at the end it seems that my assumption of AtomicBoolean[] performing better under heavy concurrent scenarios is proven by your test. Thanks for your time. Very good job. – piacente.cristian Mar 07 '14 at 02:06
@Gray It's not about the accuracy (nanosecond accuracy is obviously not really what we aim for), but the millis call is not guaranteed to be monotonic and may be much less precise than the nano call (i.e. nanoseconds may be precise up to the ms, while millis may have a granularity of 20 or more ms!). The first one can be a problem, the later one is unlikely to matter much considering that we're talking about several second runs. Won't matter most of the time, but why take unnecessary risks of contaminating your tests? – Voo Mar 07 '14 at 02:09
A better take away @piacente.cristian is that the difference is tiny so you should choose the pattern that is the simplest and easiest to understand and maintain. – Gray Mar 07 '14 at 02:19
@Gray I've just added an [explanation](http://stackoverflow.com/a/30422502/581205) why the difference grows with contention. It's not caused by the different performance of the classes but by the memory layout. – maaartinus May 24 '15 at 10:35

piacente.cristian · Answer 2 · 2014-03-06T22:42:25.223

3

Actually watching the implementation of AtomicIntegerArray

http://fuseyism.com/classpath/doc/java/util/concurrent/atomic/AtomicIntegerArray-source.html

it seem that it is managed with more attention then I thought.

It doesn't use Objects to store the values, making it more efficient in memory. In fact it uses a simple int[] and then access them in a safe way.

So I think that if you need to use many AtomicInteger it is better to use the AtomicIntegerArray.

AtomicIntegerArray: uses the Unsafe class to make atomic access to a single int[] in the AtomicIntegerArray

AtomicBoolean[]: every single object of the array has it's object(itself) for making atomic access

So I would expect a better performance in a heavy concurrent threaded environment with an AtomicBoolean[], with more memory consumption than the AtomicIntegerArray.

edited Mar 06 '14 at 22:42

answered Mar 06 '14 at 22:02

piacente.cristian

757
4
10

2

Can you add more details? Why? – Gray Mar 06 '14 at 22:02
2

An AtomicIntegerArray also has a fixed length. In fact, I wonder what an AtomicIntegerArray can do that an array of AtomicInteger can't. (except disallowing the replacement of any element in the array) – JB Nizet Mar 06 '14 at 22:04
Are you saying that you can change the size of AtomicIntegerArray??? Like @Gray I did not understand your reasoning... – JohnPristine Mar 06 '14 at 22:04
1

@JBNizet I asked myself the same thing, but at the very least it would be faster and more memory efficient (no need for n wrapper objects). – Mar 06 '14 at 22:04
@piacente.cristian So in terms of being faster, which one would you use? AtomicIntegerArray or AtomicBoolean[] ? – JohnPristine Mar 06 '14 at 22:09
if you need to use them concurrently I would use AtomicBoolean[], while if I expect them to be used only sometime concurrently by your threads I would choose AtomicIntegerArray since it seems that it only uses one lock. – piacente.cristian Mar 06 '14 at 22:12
@piacente.cristian Can you elaborate on what you mean by "concurrently" versus "sometimes concurrently"? They are going to be used concurrently of course... but not sure what measure you are talking about here. Thanks for the help! – JohnPristine Mar 06 '14 at 22:14
I mean if you expect a lot of thread working on the same array concurrently, and so the bottle neck of you algorithm is the atomic access of the object then the AtomicBoolean[] is a better solution. – piacente.cristian Mar 06 '14 at 22:15
Both AtomicBoolean and AtomicIntegerArray store their values in primitives. The get/set method for the AtomicIntegerArray class have much higher overhead. – mttdbrd Mar 06 '14 at 22:16
@mttdbrd but AtomicBoolean has its own lock for its atomic access and is an Object that means that I suppose that it will have more memory consumption then the same AtomicIntegerArray. – piacente.cristian Mar 06 '14 at 22:18
3

@piacente.cristian In your comments you mention locks a couple times and I wonder if you are misunderstanding the mechanism used here. (There are no locks.) – Radiodef Mar 06 '14 at 22:27
@Radiodef That's a good catch. The specification for the package clearly states that these variables support "lock-free thread-safe programming on single variables." – mttdbrd Mar 06 '14 at 22:31
@mttdbrd But does it change anything in terms of who is faster for CAS (compareAndSet)? I don't think so, but I can be wrong... – JohnPristine Mar 06 '14 at 22:32
2

Yeah this has nothing to do with contention because of the lack of locks. I've edited my answer to show that at 10 _million_ iterations I see little difference between them. – Gray Mar 06 '14 at 22:35
1

Trying to understand that too ... http://mishadoff.github.io/blog/java-magic-part-4-sun-dot-misc-dot-unsafe/ . However @Radiodef meaning lock I meant any kind of locking/atomic mechanism.. I was not caring about how it is actually implemented here – piacente.cristian Mar 06 '14 at 22:35
1

@JohnPristine It's hard to say. The method that compareAndSet calls is itself a native method (JNI), so we'd have to look at the underlying VM code and the HotSpot code. The only way to really know is to test it in a production environment. I like Gray's answer to the question the best. – mttdbrd Mar 06 '14 at 22:36
@piacente.cristian Then your comments are still kind of confusing. There is a very big difference between locks and volatile. Also AtomicIntegerArray is using `putIntVolatile` so it's not like writing to one element prevents a read to a different element. – Radiodef Mar 06 '14 at 22:54

score 2 · Answer 3 · answered May 24 '15 at 10:32

I'd say that both are equally performant, except when heavily contended. That (as Gray's benchmark show), the AtomicBoolean[] wins over AtomicIntegerArray easily. What's missing is the explanation:

While AtomicIntegerArray places all ints next to each other as it operates on its internal int[], while AtomicBoolean[] is an array of int containing objects. These objects add an overhead of few (8 or 12) bytes, so that the underlying ints are not tightly packed.

So they span a different number of cache lines and here False sharing comes into play. As the cache line is typically 64 bytes, the whole data of new AtomicIntegerArray(10) fits into it (unless it starts unaligned and then two cache lines get used). This means a 100% probability of false sharing, i.e., it's like all threads contented for a single variable.

With the overhead of AtomicBoolean[], we get something like 160 bytes instead of 40 and therefore much less false sharing.

I guess that Gray's benchmark has quite some overhead (% operations and conditions) and that the real speed difference would be bigger.

This doesn't mean that AtomicIntegerArray is bad. It's just that it shouldn't be used like this if really heavily contended. The simple solution would be to allocate a much larger array and use only every 16th element, effectively reducing false sharing to zero.

Which is "better". AtomicIntegerArray (1/0 as true/false) versus AtomicBoolean[]?

3 Answers3

Linked