0

I am attending an OS course as part of my undergrad and I have encountered a frustrating bug that is only present when compiling with -O2/3 flags set.

System: x86
Compiler: GCC
Emulator: Bochs/Qemu

I'm maintaining critical sections using spin locks, a TTAS implementation.

static int
xchange(int*s)
{
    int val = LOCKED;
    /* Exchanging value at lock address with 1, returns the old value */
    asm volatile("xchg (%%eax), %%ebx" : "=b"(val) : "0"(val), "a"(s));
    return val;
}

void
TTAS(int *s)
{
    /* While lock is locked, do nothing */
    while(TRUE){
        while(*s == LOCKED){}
        
        /* If lock acquired  */
        if( xchange(s)  == UNLOCKED){
        return;
        }
    }
}

Now the bug occurs when the two threads work on a shared variable with a mix of conditional waits and locks. The threads believe they have acquired the lock, but a subsequent read return with the wrong(old) value. I tried wrapping the locks for printing out the last 'owner', but this added time causes the synchronization to hold. The last and current owner of the lock:Thread 2 racing itself

If I print the value of the lock after acquiring it.

TTAS(lock <int*>);
print(lock::val);
print(lock::val);

The first print '0', the second '1'.
If I swap the TTAS with a TAS, it seemingly works.

void
TAS(int *s)
{
    /* While lock is locked, do nothing */
        While( xchange(s)  != UNLOCKED){}
}

I am unable to determine what is causing this behaviour, and hope some of you could help me with the reasoning.\

EDIT: Corrected a wrong void return on xchange

Nohr
  • 21
  • 2
  • `xchange` can't be `void` - you need it to return a value. https://godbolt.org/z/fY3cq43cK has a version that compiles, and uses `volatile int *s` and a fixed asm statement to make it work. https://lwn.net/Articles/793253/ / [Multithreading program stuck in optimized mode but runs normally in -O0](https://stackoverflow.com/q/58516052) / [How can I indicate that the memory \*pointed\* to by an inline ASM argument may be used?](https://stackoverflow.com/q/56432259) – Peter Cordes Mar 29 '21 at 15:32

1 Answers1

2

Ref comments below and guidance from Peter Cordes a correct solution would be:

#define UNLOCKED 0
#define LOCKED 1
#define TRUE 1


static int
xchg(int volatile *s) {
    int val = LOCKED;

    asm("xchg %0, %1" : "+m"(*s), "+r"(val)::"memory");
    return val;
}

void
TTAS_acquire(int volatile *s) {

    while(TRUE){
        while(*s == LOCKED){}
        if(xchg(s) == UNLOCKED){
            return;
        }
    }
}


void
TTAS_release(int volatile *s) {
    asm("":::"memory");
    *s = UNLOCKED;
}

EDIT: Guess I jumped the gun on this one! The solution presented seemed to fix the issues, but it was just the symptom. I take this back! Also changed the wrong return value, it was never void.

EDIT2: Rewrote the answer with Peter Cordes guidance, also including the release function see comments.

Nohr
  • 21
  • 2
  • 2
    I think the actual problem is that you're missing a `"memory"` clobber in your asm statement to tell the compiler the pointed-to memory is modified, as well as producing the `"=b"` output. (BTW, there's no need to hard-code the registers like that! One normal way would be `asm("xchg %0, %1" : "+m"(*s), "+r"(val) );` which would avoid the need for a memory clobber by making the memory operand itself visible to the compiler, and getting it to pick the addressing mode.) – Peter Cordes Mar 29 '21 at 15:20
  • I also don't think "weak consistency* is a great description for non-volatile. More like *no* consistency, i.e. assume non-shared. Which is totally fine for a *local* variable! Forcing the compiler to store and reload `val` doesn't tell it that `*s` is modified. You should also use an `ACCESS_ONCE` on `*s` in the spin loop, i.e. do that through a `volatile int *`. https://lwn.net/Articles/793253/ – Peter Cordes Mar 29 '21 at 15:26
  • Note the `.L11: jmp .L11` infinite loop in GCC `-O3` output with the code in this answer which you claim fixes the bug. https://godbolt.org/z/YfTfqW6Wh (After fixing the multiple errors like `return` in a `void` function.) – Peter Cordes Mar 29 '21 at 15:29
  • @PeterCordes You are absolutely right, and thanks for pointing out the errors I put in the code there, I changed the return values to their proper return values. I looked at the objdump files and I'm still baffled this works, it seems that the only difference when having int volatile *s and val is the operations: 17: 87 02 xchg %eax,(%edx) 19: 89 44 24 0c mov %eax,0xc(%esp) 1d: 8b 44 24 0c mov 0xc(%esp),%eax VS e: 87 18 xchg %ebx,(%eax)\ 10: 89 d8 mov %ebx,%eax\ In xchg – Nohr Mar 29 '21 at 15:52
  • @PeterCordes I failed completely on the last comment, putting that code indented. Sry.. – Nohr Mar 29 '21 at 15:58
  • Comments are useless for posting code. Best to share a Godbolt link like I did (I guess with `-m32 -O3` for 32-bit code.) But yes, `int volatile *s` in `TTAS` would fix the infinite-loop problem there; but if `int volatile val` makes any difference it's just luck / chance. – Peter Cordes Mar 29 '21 at 16:00
  • Could you help me understand how this: https://godbolt.org/z/M5qexdEee is ensuring coherence, while: https://godbolt.org/z/WEdcYhdTP fails? Pardon all the edits, fresh to this site^^ – Nohr Mar 29 '21 at 16:09
  • I meant post your C source with GCC options that make it produce the asm you're actually getting, and comment one what changed in the source. (I can use the Godbolt "diff" pane to compare the asm, but since I understand how C compiles to asm it's probably easiest to see the C as well as asm.) Also, `objdump -d` on an un-linked `.o` is pretty much the least useful output, you didn't even use `-drwC` to make it show where call targets are actually going. `call 42 ` is obviously not the real call target. – Peter Cordes Mar 29 '21 at 16:14
  • @PeterCordes Working: https://godbolt.org/z/d7WWh7Pz3. Not working:https://godbolt.org/z/z6KMbqeWE – Nohr Mar 29 '21 at 16:28
  • Your "working" version still has an infinite loop in `TTAS` that execution will get into if the lock is initially locked. `.L6:` / `jmp .L6`. Exactly because `while(*s == LOCKED)` is a non-volatile access to `*s`. – Peter Cordes Mar 29 '21 at 16:48
  • Also, you don't need `-fno-unit-at-a-time` - if there's any specific function you need to not inline, use `__attribute__((noinline))`. Or if you want to see the asm for a stand-alone version of a function, make it non-`static` – Peter Cordes Mar 29 '21 at 16:48
  • @PeterCordes I hope this then:https://godbolt.org/z/x7e9P8exK? – Nohr Mar 29 '21 at 16:50
  • Yeah, that happens to work, but is pointlessly over-complicated by storing/reloading `var` before/after the asm statement. You still didn't actually tell the compiler that the asm statement can modify memory, see my initial comments and Godbolt link on the question. It still happens to work fine with plain `int var`, because the only access to the secretly-modified memory is through `volatile int *s`. – Peter Cordes Mar 29 '21 at 16:52
  • @PeterCordes Did I get it right? https://godbolt.org/z/Ma6oTEjTn . I see I have been extremely lucky with the timing of the intial code, but since I have, could you help me see what makes it run? Apparently every time I hit the first while, it has been false, but after that the print at on the value of s has changed, just not immediately. – Nohr Mar 29 '21 at 17:15
  • Yes, that's exactly what my godbolt link was doing, so yes that's right. IDK why your earlier version was avoiding an infinite loop. Could it have been inlining into a larger function resulting in different code-gen? – Peter Cordes Mar 29 '21 at 17:21
  • I think the key thing here is to understand exactly why your initial version was broken (in terms of not telling the compiler what's going on and what you need to happen), and you can already understand that from looking at the `-O3` asm for it in isolation. Oh, but if you were trying to use this for synchronization, you still haven't used a `"memory"` clobber to get a compiler barrier and stop it reordering *other* memory accesses, if TTAS inlines. So it's safely taking the lock, but *not* safely keeping the critical section contained between lock and unlock; `volatile` doesn't do that. – Peter Cordes Mar 29 '21 at 17:26
  • right right, so that in order to actually guard this, now ordered execution, I'd have to include something like: https://godbolt.org/z/E74W7cabs? Or do I need something like mfence? – Nohr Mar 29 '21 at 17:36
  • 1
    That compiler barrier is sufficient for taking the lock, you don't need `mfence`. `xchg` is already a full barrier. A lock only needs acquire/release semantics, so this is more than strong enough for taking the lock. But *unlocking* will need `asm("" ::: "memory");` `*s = UNLOCKED;` (again with a `volatile` pointer). Like this: [x86 spinlock using cmpxchg](https://stackoverflow.com/q/6935442) – Peter Cordes Mar 29 '21 at 18:24
  • @PeterCordes Would you agree that the edited answer is a potential solution? As I understand, the reason both the acquire and release need a barrier is the possibility for the compiler to schedule so that we could end up with an execution order such as: acquire(s) -> release(s) ->write_memory.? – Nohr Mar 29 '21 at 19:54
  • Yup, that's correct. The `asm("" ::: "memory")` between the critical section's loads/stores and the unlock store makes it a release-store. (The x86 asm memory model already guarantees that only StoreLoad reordering is possible (minor oversimplification), so we only have to worry about compile-time ordering.) – Peter Cordes Mar 29 '21 at 20:08
  • Thanks alot for keeping up with me this evening. You don't happen to have any good reads on the topic? Perhaps something to look into this storeLoad simplification on x86 memory handling? Thanks again, great help, learned alot today! – Nohr Mar 29 '21 at 20:44
  • https://preshing.com/20120930/weak-vs-strong-memory-models/ - preshing's articles are great introductions. For more detail on the x86 memory model, have a look at [Globally Invisible load instructions](https://stackoverflow.com/a/50617986) - it's actually program order + a store buffer with store forwarding. Also https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html - how C++11 `std::atomic` stuff maps to asm for different ISAs. – Peter Cordes Mar 29 '21 at 20:47