why C/C++ compiler not always make ++a atomic?

Question

As the title, when we write ++a in C/C++, it seems the compiler may compile it as:

inc dword ptr[i]

which is atomic, or:

mov eax,  dword ptr[i]
inc eax
mov dword ptr[i], eax

which is not atomic.

Is there any advantage to compile it as non-atomic style?

`inc dword ptr[i]` is *not* atomic in the first place. Only with the `lock` prefix does it become atomic - it also becomes super slow. — harold, Jan 14 '18 at 07:14
Note that many CPUs have no equivalent of `inc dword ptr[i]`. Most "RISC" machines have only two types of instructions that access memory: load and store. All arithmetic instructions operate only on values in registers. — Jerry Coffin, Jan 14 '18 at 07:25
Consider adding a platform specific tag if you show assembly — LWimsey, Jan 14 '18 at 07:26
@Panzercrisis sorry I haven't test it by myself, just have read that someone else tested it and it is kind of "randomly" chosen by compiler — Louis, Jan 14 '18 at 07:51
@LWimseyThank you for your suggestion :) sorry I've just read it from somewhere else (and not in English), I am also not sure about which platform he was using. — Louis, Jan 14 '18 at 07:54
@JerryCoffin Thank you Jerry. If cpu supports the command like inc dword ptr[i] , does it mean it is finished in "1 tick" or it is like.. still split it into several commands? — Louis, Jan 14 '18 at 07:54
Might take only a few ticks if the value is already in the L1 data cache (assuming the CPU uses one). Otherwise, likely to be quite a few clocks. — Jerry Coffin, Jan 14 '18 at 08:05
@JerryCoffin Ah.. i missed the memory reading :( Thank you so much Jerry. So ... since it is reading and writing memory, the actually a single inc is not significantly faster than adding 2 mov ? — Louis, Jan 14 '18 at 08:08
No, probably not. Might improve decode speed a bit, but that's likely about all. — Jerry Coffin, Jan 14 '18 at 08:10
x86/x64 CPUs don't work the way you might expect. The instructions you write or the compiler generates are not the actual instructions executed inside the CPU. They are converted to micro-operations that get scheduled in the CPU's various execution units. These micro-ops may be executed out of order, and they may be speculatively executed (when there's a branch, the CPU may execute _both_ instruction paths and then discard the one that turned out not to be taken). There's a lot going on that you don't see in the x86/64 machine code. (Can anyone suggest a good introduction to this?) — Michael Geary, Jan 14 '18 at 08:17
the premise is wrong in the first place [How come INC instruction of x86 is not atomic?](https://stackoverflow.com/q/10109679/995714), [Can num++ be atomic for 'int num'?](https://stackoverflow.com/q/39393850/995714), [Is increment an integer atomic in x86?](https://stackoverflow.com/q/10503737/995714), [Does x86 have an atomic increment that keeps the value that was stored?](https://stackoverflow.com/q/668830/995714) — phuclv, Jan 14 '18 at 08:33
There is nothing in the C or C++ standards that *requires* it to be atomic. The compiler is therefore free to choose another implementation on any other grounds, including space or time. — user207421, Jan 14 '18 at 09:06
@MichaelGeary In particular, `inc dword ptr [i]` will load the operand into a hidden register, increment the register and write the value back into memory (or cache). It *has* to do it this way because the DDR-RAM used on current processors does not have an in-place increment operation. — Arne Vogel, Jan 14 '18 at 11:26
@ArneVogel In fact, we could state that even more strongly, as it is nothing specific to DDR memory. I'm pretty sure that none of the computers I've used in the last 50 years ever had an incrementer or other arithmetic logic built into the memory subsystem. It's always been the responsibility of the CPU. — Michael Geary, Jan 17 '18 at 22:52

score 9 · Answer 1 · answered Jan 14 '18 at 07:14

What if your code looks like this?

++a;
if (a > 1) {
  ...
}

If the compiler uses the first representation, it accesses memory to increment a, then it accesses memory again to compare to 1. In the second case, it accesses memory to get the value once and puts it in eax. Then it simply compares the register eax against 1, which is significantly quicker.

score 2 · Answer 2 · answered Jan 14 '18 at 08:41

First, you seem to have a very particular family of processors in mind. Not all have an instruction that acts directly on memory.

Even if they have, a single instruction of that kind can be a very complex and costly thing. If it is really atomic as you claim, it has to stop all other bus transfers. This slows computation down to the speed of the memory bus. This is usually orders slower than the CPU.

score 0 · Answer 3 · answered Jan 14 '18 at 07:13

0

The non-atomic variant ends up the the value in a register ready for later use.

answered Jan 14 '18 at 07:13

SoronelHaetir

14,104
1
12
23

why C/C++ compiler not always make ++a atomic?

3 Answers3