I think atomic.Load(addr)
should equal *addr
and atomic.Store(addr, newval)
should equal *addr = newval
. So why doing so(using *addr
or *addr = newval
) is not a atomic operation? I mean they will eventually be interpreted to be just one cpu instruction(which is atomic)?

- 193
- 8
-
2Presumably for the same reason as in C ([Who's afraid of a big bad optimizing compiler?](https://lwn.net/Articles/793253/)), that the compiler needs to know it can't assume no other thread changed the value (e.g. hoisting out of loops: [MCU programming - C++ O2 optimization breaks while loop](https://electronics.stackexchange.com/a/387478)), and can't use any optimization tricks that might result in multiple narrower stores [Which types on a 64-bit computer are naturally atomic in gnu C and gnu C++? -- meaning they have atomic reads, and atomic writes](https://stackoverflow.com/q/71866535) – Peter Cordes Jan 31 '23 at 03:35
-
1A decent optimizing compiler will actually optimize, not just transliterate Go into asm. It's easy to construct cases where `*addr` doesn't lead to the desired asm when you're looking at a function that makes more than one access, or has any ordering requirements stronger than `relaxed`. – Peter Cordes Jan 31 '23 at 03:36
-
2"atomic" in this context **does not** mean "executed in exactly 1 cpu instruction". Read [The Go Memory Model](https://go.dev/ref/mem) if you want to understand what it means (it's more complex than can be explained in just one comment). – Hymns For Disco Jan 31 '23 at 04:36
-
Your hardware differs from mine and both differ from mainframes from NUMA hardware. Just because some thing might be done differently on _your_ _particular_ machine doesn't mean it should be done differently in general. – Volker Jan 31 '23 at 05:57
1 Answers
Because of ordering guarantees, and memory operation visibility. For instance:
y:=0
x:=0
x=1
y=1
In the above program, another goroutine can see (0,0), (0,1), (1,0), or (1,1) for x and y. This is because of compiler reordering the code, compiler optimization,s or because of memory operation reordering at the hardware level. However:
y:=0
x:=0
x:=1
atomic.StoreInt64(&y,1)
If another goroutine sees atomic.LoadInt64(&y)==1
, then the goroutine is guaranteed to see x=1.
Another example is the busy-waiting. The following example is from the go memory model:
var a string
var done bool
func setup() {
a = "hello, world"
done = true
}
func main() {
go setup()
for !done {
}
print(a)
}
This program is not guaranteed to terminate, because the for-loop in main
is not guaranteed to see the done=true
assignment. The program may run indefinitely, may print empty string, or it may print "hello, world".
Replacing done=true
with an atomic store, and the check in the for-loop with an atomic load guarantees that the program always finishes and prints "hello, world".
The authoritative document about these is the go memory model:

- 46,455
- 3
- 40
- 59
-
Presumably also to stop it from hoisting loads out of loops? Otherwise `while(keep_running){}` can optimize into asm like `if(keep_running) while(1){};`. Even if you don't need any ordering wrt. anything else (like C++ `memory_order_relaxed`), in most languages you still can't safely use plain non-atomic operations on shared variables. e.g. [MCU programming - C++ O2 optimization breaks while loop](https://electronics.stackexchange.com/a/387478). Or does Go guarantee inter-thread visibility of plain load/store with no synchronization, so it can't hoist loads out of loops? – Peter Cordes Jan 31 '23 at 05:02
-
Correct. I added compiler optimizations as a cause for this behavior. Go is one of those languages where you cannot use non-atomic operations on shared variables. However, you can rely on certain ordering guarantees of non-atomic operations based on a happened-before relationship, which relies on a syncronized-before relationship that uses synchronizing operation (like channel operations, or atomics) – Burak Serdar Jan 31 '23 at 05:06
-
Compiler optimization of that example could only remove possibilities, e.g. dead store elimination might collapse it to just storing the `1`s, not first the zeroes. Or if the initializers were static, then there's nothing to optimize in the store side. The problem with compilers hoisting loads out of loops isn't ordering, it's using the first load result indefinitely, effectively making inter-thread visibility take infinite time. Or various other problems detailed in https://lwn.net/Articles/793253/, e.g. compilers inventing loads and code running as if a value is both 0 and 1. – Peter Cordes Jan 31 '23 at 05:13
-
TL:DR: this example does not remotely cover the possible range of badness from using plain operations instead of StoreInt64 / LoadInt64. I'd worry that some future readers with enough knowledge to be dangerous would interpret this as basically saying that plain `*addr` is similar to C++ `atomic_load(addr, memory_order_relaxed)`, since you talk about the other thread being able to see values and a limited set of things it can see. – Peter Cordes Jan 31 '23 at 05:17
-
1
-
Looks good, thanks for covering another of the major things load/store atomics give you, with a classic example. – Peter Cordes Jan 31 '23 at 05:31