Why doesn't JVM compile "incrementing a int variable" as an atomic Fetch-and-Increment opereation?

Question

I've learned that incrementing a int variable in Java is NOT a atomic operation, however, I found that CPUs support atomic Fetch-and-Increment operation.

So my question is, why JVM doesn't compile incrementing a int variable operation to an atomic Fetch-and-Increment operation that CPUs support, which could be useful in multi-thread programming.

Early processors had atomic test-and-set, fetch-and-increment, or swap instructions sufficient for implementing mutexes that could in turn be used to implement more sophisticated concurrent objects.

--Java Concurrency in Practice

[this](http://stackoverflow.com/questions/25168062/why-is-i-not-atomic) could help — TheLostMind, Nov 24 '15 at 08:36
"I found that CPUs support atomic Fetch-and-Increment operation" - often, but how certain are you that that is the case for every platform Java targets now and will ever target in the future? — Jon Skeet, Nov 24 '15 at 08:37
[This is also relevant](http://stackoverflow.com/a/13401712/1743880) — Tunaki, Nov 24 '15 at 08:37
@user2916610 C/C++ developers may know the answer why the JVM doesn't do some specific? Yes, very likely. — Tom, Nov 24 '15 at 10:04
@DavidSchwartz Well, you should have written "@ user2916610", not "@ Tom", because OP was the one who misused the tags. My (sarcastic) comment was an answer about (now deleted) comments from OP. — Tom, Nov 30 '15 at 14:10

David Schwartz · Answer 1 · 2015-11-30T12:00:21.580

So my question is, why JVM doesn't compile incrementing a int variable operation to an atomic Fetch-and-Increment operation that CPUs support, which could be useful in multi-thread programming.

Because on typical modern CPUs, atomic read-modify-write operations (such as incrementing) are dozens of times more expensive than their corresponding non-atomic operations. And it would provide no benefit -- code can't rely on the operations being atomic because they're not guaranteed to be atomic. So what would the benefit be?

Though it's not directly relevant to your question, because so many other people have explained this incorrectly, I'll explain the two differences between an atomic increment and a non-atomic increment (at the hardware level):

An atomic increment cannot overlap certain other operations in that same core. That is, it must take place at some specific time. This means that CPU instruction pipelining is typically severely negatively impacted by atomic operations.
To prevent another thread from overlapping an operation to the same cache line in the middle of our atomic operation (between the read and the write), the cache line is locked during the atomic operation. If another core attempts to take the cache line from the CPU executing the atomic operation, even for a non-atomic operation, it will have to wait until the atomic operation completes. (This used to be a bus lock. Modern CPUs are much smarter.)

Of course, there's no guarantee that every CPU will be the same, but modern CPUs that have multiple cores and popular Java implementations are nearly certain to have multi-core operations highly optimized. Future CPUs, of course, may be even better.

Also, to correct another common misconception: The caches on modern multi-core CPUs communicate directly. They never need to go through main memory to synchronize CPUs (except perhaps in the rare case where the data needed is only in main memory and for some reason couldn't be prefetched). If data is in one core's cache, it can go directly to another core's cache using a variant of the MESI protocol. This is a good thing -- multi-core CPUs would perform pretty terribly if inter-core synchronization had to go through RAM.

From the first paragraph of your answer, "...atomic operations ... would provide no benefit [because] code can't rely on the operations being atomic." That doesn't make any sense. — Solomon Slow, Nov 24 '15 at 14:48
@jameslarge Atomic operations are only beneficial if you are guaranteed to get atomic operations and can rely on the operation being atomic. Getting an atomic operation in a context where you weren't guaranteed the operation was atomic wouldn't help you because you'd still have to code as if it wasn't. — David Schwartz, Nov 24 '15 at 19:05
I don't understand the meaning of "atomic operation in a context where you weren't guaranteed the operation was atomic." What do the words "atomic operation" mean if not "guaranteed to be atomic?" — Solomon Slow, Nov 24 '15 at 19:17
Oh, I see the confusion. Sure, an atomic operation is guaranteed to be atomic, that's what makes it an atomic operation. But when you write code like `i++;` the compiled result might contain an atomic operation or it might not. Since you're not guaranteed that the compiler will output an atomic operation, the compiler doesn't help you by outputting one. You still have to code as if it didn't because when you wrote your code, you couldn't know that it would. — David Schwartz, Nov 24 '15 at 19:20
So the JVM wouldn't do you any good by outputting an atomic operation here. When you wrote the code, you couldn't have known that it would do that, since it's not guaranteed. So you would have to assume it wouldn't when you wrote the code. — David Schwartz, Nov 24 '15 at 19:21

Devolus · Answer 2 · 2015-11-24T09:26:53.387

3

Because the Java Standard (JLS) doesn't require it and because it is an expensive operation which should be employed only when needed.

edited Nov 24 '15 at 09:26

answered Nov 24 '15 at 08:58

Devolus

21,661
13
66
113

'JVS'? Do you mean JLS? JVM Spec? And 'expansive'? do you mean 'expensive'? – user207421 Nov 24 '15 at 09:06

score 3 · Answer 3 · 2015-11-24T10:01:02.907

So my question is, why JVM doesn't compile incrementing a int variable operation to an atomic Fetch-and-Increment operation that CPUs support, which could be useful in multi-thread programming.

Besides the obvious answer that the JVM may need to target hardware that lacks such native instructions, I want to address the more general, "Why not make every primitive operation atomic even if all the targeted hardware supports it?"

Thread safety != Thread Efficiency

Whenever you involve an atomic operation like fetch-and-add/inc in hardware that supports it, there is a need for a potentially far more expensive set of instructions.

With such costs, imagine using an atomic fetch-and-add to simply increment the counter in a massive loop doing very light work per iteration. Such an introduction could degrade the performance of the loop drastically to a point where the program is slowed to a fraction of its original speed.

Thread efficiency, by nature, often requires a large portion of your codebase to lack thread safety, as in the above example with the loop counter. Ideally all the code that is only going to be used by a single thread should be thread-unsafe. It shouldn't be paying the cost of locking and synchronization in places that don't need it.

We're far from the point where we have such smart compilers that can anticipate whether an operation is going to require thread safety/atomicity or not. So thread efficiency is often in the hands of the programmer for the time being, and thread-safety along with it.

Ah I see -- so the `LOCK` instruction prefix on x64, e.g., is no longer indicative of a bus lock, but something else? — , Nov 24 '15 at 09:54
I see! I'll try to correct the answer -- I was already reluctant to even mention "bus lock" since it's going to a lower-level computer architecture territory than one with which I've been keeping up-to-date. I should have stayed in the safe zone and just called it "expensive". I'll update the answer to avoid the inaccuracy -- thanks! — , Nov 24 '15 at 09:57
Updated, though I would appreciate it if you keep your comments/links here for others who might make the same mistake that I did. — , Nov 24 '15 at 10:01
I think I added substantially the same point to my answer. It's not directly relevant to yours now. — David Schwartz, Nov 24 '15 at 10:04

Krzysztof Cichocki · Answer 4 · 2015-11-25T07:14:53.443

0

Because for multi-core processors there is a separate program memory cache for each core, or in multiprocessor system for each processor. To run faster program is loaded into this cache, because it runs a lot faster than RAM, but then each core executes a thread into its own RAM memory copy (in the cache), often not visible to other threads.

This memory of course can be synchronized with ram, but this op is slow. Writing and Reading directly from RAM is slow, it is the bottleneck of multi-cpu/core systems, it is why there is a cache memory in processors, to preload piece of program to it, so it can run faster (512KB of program contains usually a lot of loops, so it executes some time, and in this time other cores are fed from RAM, it speeds up the whole system).

To make an operation atomic and visible to all threads means it can't be cached, so it needs to use direct RAM memory for reading and writing (or equivalent - some special cache), and this of course can slow down the application. It is the reason why it is not default, because more often than not you do not need the potentially slow synchronisation.

edited Nov 25 '15 at 07:14

answered Nov 24 '15 at 08:48

Krzysztof Cichocki

6,294
1
16
32

This is completely, 100% false and a common misconception. RAM is very slow and modern, multi-core CPUs would suck if this was true. Of course, it's *not* true. Modern CPUs use cache coherency hardware to make the caches invisible across CPUs. – David Schwartz Nov 24 '15 at 09:30
@David Schwartz please explain what misconception is there? Does all modern CPU's have this cache coherency hardware? Does all java VM's run on systems with cache coherency hardware? I don't think so... – Krzysztof Cichocki Nov 24 '15 at 09:39
Yes. Without it, performance would be so bad that there would be no point in having multi-core CPUs. All Java VM's that run on multi-core CPUs run on multi-core CPUs with cache coherency hardware. (I mean, I can't be 100% sure there isn't an exception somewhere, but it would be some weird edge case.) – David Schwartz Nov 24 '15 at 09:42
@David Schwartz it depends on the hardware, you don't know every hardware, you can't. It is why java does not give you a raw synchronization primitives, but rather some kind of API over the platform. – Krzysztof Cichocki Nov 24 '15 at 09:55
Right, that's another reason why you're wrong. Your answer says all kinds of things about the hardware and you don't know every hardware either, nor can you. But the difference is -- my answer happens to be right for pretty much every multi-core CPU in existence that Java runs on while yours is wrong for them. Hardware on which your answer is correct has never existed and likely will never exist. It's a myth, invented by people who don't understand inter-core synchronization. It has *no* basis in fact whatsoever. – David Schwartz Nov 24 '15 at 10:07
My answer is 100% true, you are not reading with understunding. Cache coherency hardware is not used in all processors. Cache is the only reason why one thread may not see values stored by other threads, it is why java has memory barriers aka synchornisation and "volatile" key word, to express where the points of data exchange beetween threads occur. – Krzysztof Cichocki Nov 25 '15 at 07:30
I stand by my analysis. Your answer is totally and completely wrong. I am an expert on this subject, with more than 20 years of experience. On no multi-core CPU that Java can run on, to my knowledge, does synchronization with RAM, what your answer talks about, have *anything* *whatsoever* do with memory visibility or inter-thread synchronization. Could there exist such a CPU, perhaps some obscure place or in the future, yes. Does an answer that talks about this make any sense. No, it does not. – David Schwartz Nov 25 '15 at 08:46
Even if you have 20 years of experience and even that You state that you are an expert in this topic, You simply can't know every CPU. How about boards with multiple CPU's? How you would implement a cache coherency hardware for them to not kill the entire system with synchronization traffic? – Krzysztof Cichocki Nov 26 '15 at 08:34
Fortunately, to answer a simple question like this one, you don't need to know every CPU. Just the most popular ones that support the platform being asked about. If you want me to explain how boards with multiple CPUs implement cache coherency in hardware without killing the system with synchronization traffic, I'd be happy to. Ask your own question. It's a fascinating subject and one that I am also an expert on. – David Schwartz Nov 27 '15 at 19:52

score 0 · Answer 5 · answered Nov 24 '15 at 08:58

0

why JVM doesn't compile incrementing a int variable operation to an atomic Fetch-and-Increment operation that CPUs support, which could be useful in multi-thread programming.

I have asked this question a number of times as I believe it should when the field is volatile. The feedback I have got is that since the early versions of Java didn't do this to fix it now would break backward compatibility. i.e. some one might have a program which unknowingly relies on this behaviour. Given that I don't feel it very useful nor is it guarenteed that an increment will not behave atomically and 99%+ of the time it will I don't see a problem in making it 100% of the time esp when the field is volatile but that is just my opinion.

answered Nov 24 '15 at 08:58

Peter Lawrey

525,659
79
751
1,130

Just to be clear (and I think this is what you're saying), `volatile` does *not* make the increment atomic. You are arguing that it should. Also, I can't figure out what you're saying in your last sentence. Are you saying you don't see a problem with making all increments, whether of `volatile`s or not, atomic? If so, have you considered the fact that an atomic increment is dozens of times more expensive than a non-atomic increment on typical multi-core CPUs in existence today? – David Schwartz Nov 24 '15 at 10:17
@DavidSchwartz you are saying that `x++` for a `volatile` is much faster than `AtomicInteger.incrementAndGet()` if so, is it worth the performance trade off to have an operation which occasionally doesn't do anything. – Peter Lawrey Nov 24 '15 at 11:10
There's no trade off. With the current design, you get exactly what you asked for and you don't pay for what you don't need. If you're using `volatile` (rather than atomic or a lock) you're specifically trying to get the lightest, but most complicated, synchronization. If you make `volatile` equivalent to atomics, why have it at all? Syntactic sugar? – David Schwartz Nov 24 '15 at 19:07
@DavidSchwartz So you are suggesting the people use and would want the behaviour that `x++` provides for volatile fields. Do you know any one who uses it the way it is? I would like to know more about why people would prefer it's current implementation. – Peter Lawrey Nov 25 '15 at 06:57
I personally use it in cases where all modifications are protected by locks anyway and I just need reads to establish happens before relationships. – David Schwartz Nov 25 '15 at 08:43
@DavidSchwartz if all modifications are protected by locks, you can read the variable without a lock if it is volatile, but ideally you don't want the `++` to be on a volatile field at all. i.e. it stalls the CPU pipeline on a write and again when you release the lock. i.e. you want a non-volatile ++ and a volatile read. – Peter Lawrey Nov 25 '15 at 08:46
I don't think it does. Why would it? It's not atomic. But in any event, it's the lightest option I have, I think. – David Schwartz Nov 25 '15 at 08:48
@DavidSchwartz i agree, but my point is that the way `x++` works now for volatile fields, no one would wish for or needs. – Peter Lawrey Nov 25 '15 at 09:04

score -1 · Answer 6 · edited May 23 '17 at 12:04

-1

--I guess that not every machine that can run Java have this as atomic operation. Remember that Java can run on several different platforms and lot of different devices.--

Check answer Why is i++ not atomic? mentioned by one of a colleagues in comments

edited May 23 '17 at 12:04

Community

1
1

answered Nov 24 '15 at 08:39

Greg Witczak

1,634
4
27
56

This makes no sense. If you can't implement atomic operations on a machine, then you can't implement any Java feature that requires them (such as explicitly atomic operations), and so that machine couldn't run Java. – David Schwartz Dec 01 '15 at 10:19

score -1 · Answer 7 · answered Nov 24 '15 at 08:50

-1

Because:

Java compilers don't compile to machine code, they compile to bytecode, and
There isn't an atomic fetch-and-increment instruction in the JVM bytecode, except for local variables.

answered Nov 24 '15 at 08:50

user207421

305,947
44
307
483

Why doesn't JVM compile "incrementing a int variable" as an atomic Fetch-and-Increment opereation?

7 Answers7