Input Operand `"m"(var)` and Output Operand `"=m"(var)` in GNU C inline asm? Used with no instructions as barriers?

Question

I wonder what input operand "m"(var) and output operand "=m"(var) in asm do:

asm volatile("" : "=m"(blk) : :);
asm volatile("" : : "m"(write_idx), "m"(blk) :);

I ran into the 2 lines above here, in a SPMC queue.

And what are the side effects? The lines above have no asm instruction, and so I believe the author was trying to utilize some well defined side effects (e.g. flushing the values of write_idx and blk hold by registers if they are in the 2nd line?)

The code you link to is C++. Do not tag C for C++ questions. — Eric Postpischil, Jan 16 '23 at 13:22

fuz · Answer 1 · 2023-01-16T13:42:11.253

2

asm volatile("" : "=m"(blk) : :);

After this statement, the compiler believes that some random assembly code has written to blk. As an output operand was given, the compiler assumes that the previous value of blk doesn't matter and can discard code that assigned a value to blk that nobody read before this statement. It also must assume that blk now holds some other value and read that value back from memory instead of e.g. from a copy in registers.

So shortly, this looks like an attempt to force the compiler into treating blk as if it was assigned to from a source unkown to the compiler. This is usually better achieved by qualifying blk as volatile.

Note that as the asm statement is qualified volatile, the statement additionally implies an ordering requirement to the compiler, preventing certain code rearrangements.

Similar code (though without the volatile) could also be used to tell the compiler that you want an object to assume an unspecified value as an optimisation. I recommend against this sort of trick though.

asm volatile("" : : "m"(write_idx), "m"(blk) :);

The compiler assumes that this statement reads from the variables write_idx and blk. Therefore, it must materialise the current contents of these variables into memory before it can execute the statement. This looks like a crude approximation of a memory barrier but does not actually effect a memory barrier as was likely intended.

Aside from this, the code you have shown must certainly be defective: C++ objects may unless specified otherwise not be modified concurrently without synchronisation. However, this code cannot have such synchronisation as it does not include any headers with types (such as std::atomic) or facilities (such as mutexes) for synchronisation.

Without reading the code in detail, I suppose that the author just uses ordinary objects for synchronisation, believing concurrent mutation to be well-defined. Observing that it is not, the author likely believed that the compiler was to blame and added these asm statements as kludges to make it generate code that seems to work. However, this is incorrect though it may appear to work on architectures with sufficiently strict memory models (such as x86) and with sufficiently stupid compilers or by sheer luck.

Do not program like this. Use atomics or high level synchronisation facilities instead.

edited Jan 16 '23 at 13:42

answered Jan 16 '23 at 13:10

fuz

88,405
25
200
352

"As an output operand was given, the compiler assumes that the previous value of blk doesn't matter and can discard code that assigned a value to blk that nobody read before this statement." <-- do you mean the code assigning values to `blk` prior to the `asm` statement could be removed by compiler? or do you mean the compiler should generate code not to rely on the assignments for `blk` prior to the `asm` such that it should fetch from main memory? – HCSF Jan 16 '23 at 13:35
@HCSF Both happen. When I wrote the answer, I was seeing this as an optimisation, and not as a futile attempt at writing asynchronous code. – fuz Jan 16 '23 at 13:39
1

I do agree that using `std::atomic_ref` is a better choice here. Tho, I am trying to understand whether the author's code is indeed correct by leveraging some well defined side effects. I tried to modify the code a bit [here](https://godbolt.org/z/rz6K55Ms7), and the corresponding assembly code for the assignment for `blk.idx` doesn't get discarded. However, if I changed `"=m"(blk)` to `"=m"(blk.idx)`, it got discarded. I thought the memory address of `blk` and `blk.idx` is the same as `idx` takes up the first few bytes of the struct. No? – HCSF Jan 16 '23 at 14:14
@HCSF: It's in general not safe or correct (even when rolling your own atomic with a known compiler like GCC) to use just barriers without `volatile` like this: compilers can invent loads, and a value changing after one load can lead to inconsistent behaviour. See [Who's afraid of a big bad optimizing compiler?](https://lwn.net/Articles/793253/) which is specifically about all the possible bad things that can happen when you write code like this, instead of doing what the Linux kernel does, using a `WRITE_ONCE` or `READ_ONCE` macro to get a `volatile T*` access. – Peter Cordes Jan 16 '23 at 14:20
(In GCC, a `volatile` access is atomic for types that can be cheaply atomic in asm for the target, so it's a bit like an `atomic_ref` `.load(relaxed)`. Unlike with plain assignment, where GCC will sometimes in practice split up a 64-bit store of a constant where both halves are the same, as in *[Which types on a 64-bit computer are naturally atomic in gnu C and gnu C++? -- meaning they have atomic reads, and atomic writes](https://stackoverflow.com/a/71867102)*) – Peter Cordes Jan 16 '23 at 14:23
That said, I wouldn't be surprised if it manages to compile with GCC to correct asm for most ISAs. I haven't looked at the code to see how it does run-time barriers, or if it's only for x86-64 so compiler-barriers are all that's needed for release/acquire synchronization. But I agree with fuz; this is a silly hack with probably zero benefit vs. using C++ `std::atomic` with appropriate `memory_order` parameters. At best perhaps a tiny benefit of maybe letting the compiler fold a load into a memory source operand for `add` or something, because it seems to miss those w. volatile and atomic – Peter Cordes Jan 16 '23 at 14:42
@HCSF The code cannot be correct for the simple reason that the whole way `idx` is updated is not an atomic RMW operation and hence will cause race conditions. – fuz Jan 16 '23 at 15:49
Also, the compiler is never going to generate read/write barriers with this code on architectures that need them. – fuz Jan 16 '23 at 15:50

score 0 · Answer 2 · answered Jan 16 '23 at 13:09

0

The first one sounds like a bug to me. An asm output operand must always be written to. The compiler will assume any stores to blk before that can be optimized out because the asm block will set a new value. Example:

extern int foo;
int bar()
{
    auto& blk = foo;
    asm volatile("" : "=m"(blk) : :);
    return blk;
}

int baz()
{
    foo = 42;
    return bar();
}

Compiling with gcc 12.2 at -O2 will produce:

baz():
        movl    foo(%rip), %eax
        ret

As you can see, the foo = 42; has been optimized out, this is just the return blk;. Commenting out the asm, for comparison:

baz():
        movl    $42, foo(%rip)
        movl    $42, %eax
        ret

(The calls to bar have been inlined in both cases.)

answered Jan 16 '23 at 13:09

Jester

56,577
4
81
125

It certainly is a neat way to assign an unspecified value to an object. – fuz Jan 16 '23 at 13:10
Yeah, except the next line in the code does `unsigned int new_idx = blk.idx;` so using that unspecified value... – Jester Jan 16 '23 at 13:11
1

See my answer; the code is trying to implement a concurrent ring buffer but the author has never heard of atomics, so he's trying to hack something with `volatile`, 1990's style. – fuz Jan 16 '23 at 13:13
Could be that OP wants to communicate to the compiler that `blk` was written to asynchronously from another thread and the compiler should not assume it still holds its prior value. – fuz Jan 16 '23 at 13:22
@fuz the code was a single-producer-multiple-consumer queue; and so the author of the code likely had the intention to tell the compiler that `blk` could be updated by another thread asynchronously. – HCSF Jan 16 '23 at 13:33

Input Operand `"m"(var)` and Output Operand `"=m"(var)` in GNU C inline asm? Used with no instructions as barriers?

2 Answers2