Linkedlist access multi-threaded application

Question

I have a multi-threaded application where multiple threads access the same variable - to a pointer. volatile would not be used for the purpose of ensuring atomic access (which doesn't exists), rather to ensure compile doesn't kick in hard optimization because it thinks that one thread cannot affect thread 2 and says no way it can change....

I've seen several volatile is useless in multi-threaded applications post. I have a linked list, that is accessed in 2 threads. Effectively does the following:

Thread 1

Allocates memory for structure
Puts the thread to linked list
Processes the linked list and checks when some value is set to xyz

Thread 2

Reads the linked list
Does some processing
Sets the value to xyz

All operations to linked list are protected by system mutex (FreeRTOS), but this doesn't ensure thread 2 sees the same value as thread 1 from start linked list pointer.

There are in my opinion 2 critical parts in the code:

What threads see from list_starts pointer
What threads see in the e->data value inside the pointer

This is the code, that is done wrongly - it is a reference code, not used in the production.

#include <stdint.h>
#include <stdlib.h>
#include <stdatomic.h>

typedef struct mytype {
    struct mytype* next;
    int data;
} mytype_t;
typedef _Atomic mytype_t mytype_atomic_t;

mytype_t* list_starts;
//mytype_atomic_t* list_starts;

void
thread1() {
    //Start thread2 here...
    //start_thread(thread2);

    //Run the thread here...
    while (1) {
        //mutex_lock();
        /* Create the entry */
        mytype_t* entry = malloc(sizeof(*entry));
        if (entry != NULL) {
            entry->next = list_starts;
            list_starts = entry;
        }
        //mutex_unlock();
        
        //mutex_lock();
start_over:
        for (mytype_t* e = list_starts, *prev = NULL; e != NULL; prev = e, e = e->next) {
            //mutex_unlock();
            //Simply wait to become 3
            while (e->data != 3) {}
            //mutex_lock();

            //Remove from the list
            if (prev == NULL) {
                list_starts = e->next;
            } else {
                prev->next = e->next;
            }
            free(e);
            goto start_over;
        }
        //mutex_unlock();
    }
}

Godbolt with -mcpu=cortex-m4 -O2 produces: https://godbolt.org/z/T84K3x3G4

thread1:
        push    {r4, lr}
        ldr     r4, .L12
.L4:
        movs    r0, #8
        bl      malloc
        cbz     r0, .L2
        ldr     r3, [r4]
        str     r3, [r0]
.L6:
        //Critical part - loaded once
        ldr     r3, [r0, #4]
.L5:
        //Comparison and BNE between self -> no reload
        cmp     r3, #3
        bne     .L5
        ldr     r3, [r0]
        str     r3, [r4]
        bl      free
        ldr     r0, [r4]
        cmp     r0, #0
        bne     .L6
        b       .L4
.L2:
        ldr     r0, [r4]
        cmp     r0, #0
        beq     .L4
        b       .L6
.L12:
        .word   .LANCHOR0
list_starts:

If we change the original code by using mytype_atomic_t* type in the for loop, like so

for (mytype_atomic_t* e = list_starts, *prev = NULL; e != NULL; prev = e, e = e->next) {
    while (e->data != 3) {}
}

Then godbolt produces: https://godbolt.org/z/94an17x1G

thread1:
        push    {r3, r4, r5, lr}
        ldr     r4, .L13
        ldr     r5, [r4]
.L4:
        movs    r0, #8
        bl      malloc
        cbz     r0, .L2
        str     r5, [r0]
        str     r0, [r4]
.L6:
        adds    r2, r0, #4
.L5:
        //Barrier
        dmb     ish
        //Load first...
        ldr     r3, [r2]
        dmb     ish
        cmp     r3, #3
        //Jump back to pre-load
        bne     .L5
        dmb     ish
        ldr     r3, [r0]
        dmb     ish
        str     r3, [r4]
        bl      free
        ldr     r0, [r4]
        cmp     r0, #0
        bne     .L6
.L7:
        movs    r5, #0
        b       .L4
.L2:
        cmp     r5, #0
        beq     .L7
        mov     r0, r5
        b       .L6
.L13:
        .word   .LANCHOR0
list_starts:

The ultimate question - is it enough to simply:

declare custom type with _Atomic keyword?
Do not use volatile at all in such case?

`volatile` isn't useless, it's how we rolled our own atomics in the bad old days before C11. It's not *recommended* or good, or guaranteed to be portable, but it does work in practice: [When to use volatile with multi threading?](https://stackoverflow.com/a/58535118) It's somewhat similar to `_Atomic` with `memory_order_relaxed`, at least on compilers like GCC that support this usage (e.g. the Linux kernel relies on it) by not splitting up a `volatile` access into two halves or anything like that, [like it can for non-volatile](//stackoverflow.com/q/71866535/224132) — Peter Cordes, Jun 06 '23 at 22:09

score 1 · Answer 1 · answered Jun 06 '23 at 23:05

The ultimate question - is it enough to simply:

declare custom type with _Atomic keyword?

No, this is wrong for your purpose. You are not allowed to access the members of an atomic structure (undefined behavior results). All you can do with one is read or write its whole (structure) value.

If you were not performing deletes, then it might sufficient to declare both members _Atomic, and also the list head pointer:

struct mytype {
    struct mytype * _Atomic next;
    _Atomic int data;
};

struct mytype * _Atomic list_starts;

(Apply typedefs if you feel you must, though as a general rule, I recommend avoiding them.)

Do not use volatile at all in such case?

You do not need volatile, and it would not be sufficient if you used it (as far as can be judged by the C language specification; implementations may provide stronger guarantees).

However, the involvement of deletions from the list, and especially of frees, will require you to perform some kind of locking. No thread may access an allocated object after it has been freed, and you cannot ensure that it is safe to free one of your list nodes without engaging some kind of modifiable state that advises (or forces) other threads to wait -- that's the definition of a lock.

I can see why you want to avoid mutexes here, and especially to avoid a single mutex for the whole list. You may want to consider rolling your own lock with an additional _Atomic member of the structure, used with a busy loop performing an atomic compare and swap to perform locking and unlocking. Something like this, maybe:

struct mytype {
    struct mytype * _Atomic next;
    _Atomic int data;
    _Atomic _Bool is_locked;
};

void lock_mytype(struct mytype *x) {
    _Bool locked;
    do {
        locked = 0;
    } while (!atomic_compare_exchange_weak(&x->is_locked, &locked, 1));
}

void unlock_mytype(struct mytype *x) {
    _Bool locked = 1;
    if (!atomic_compare_exchange_weak(&x->is_locked, &locked, 0)) {
        // attempted to unlock a mutex that is not locked
        abort();
    }
}

That's pretty bare-bones, of course. At minimum, you could consider augmenting it to recognize which thread holds the lock, and to fail if a different one tries to unlock it.

To a first approximation, each thread will need to lock each node before accessing its members (other than is_locked). That moots the other members being atomic. To perform a deletion safely, you must lock both the predecessor and the node to delete, predecessor first. Update the predecessor's next member while holding both locks, then the predecessor's lock may be released. The deleted node's lock does not need to be released.

Note also that structuring linked lists with dummy head nodes instead of bare head pointers tends to provide for simpler code by eliminating the need for special cases. You might find that especially useful here, yielding something more like this:

struct mytype {
    struct mytype* next;
    int data;
    _Atomic _Bool is_locked;
};

struct mytype dummy_list_head;

void thread1() {
    while (1) {
        mytype_t* entry = malloc(sizeof(*entry));
        if (entry == NULL) {
            abort();
        }
        lock_mytype(&dummy_list_head);
        entry->next = dummy_list_head.next;
        dummy_list_head.next = entry;
        unlock_mytype(&dummy_list_head);
        
        while (1) {
            struct mytype *e;
            lock_mytype(&dummy_list_head);
            e = dummy_list_head.next;
            unlock_mytype(&dummy_list_head);
            if (e == NULL) break;

            int data;
            do {
                lock_mytype(e);
                int data = e->data;
                unlock_mytype(e);
            } while (data != 3);

            lock_mytype(&dummy_list_head);
            lock_mytype(e);
            dummy_list_head.next = e->next;
            unlock_mytype(&dummy_list_head);
            free(e);
        } 
    }
}

You might be able to tweak that a bit based on the knowledge that thread 1 is the only one that will ever perform deletions. That might involve making the data member atomic after all. But again, you need some form of locking here.

*To perform a deletion safely, you must lock both the predecessor and the node to delete, predecessor first* How do you handle the predecessor node being deleted while you're waiting for the lock? — Andrew Henle, Jun 06 '23 at 23:48
@AndrewHenle, As far as I can tell from the question, only one thread performs any deletions, and that is the one that needs to lock the predecessor in the first place, so I don't think it's an issue in this case. But if you needed to accommodate other threads performing deletions then I think you would need to walk the list from the beginning, locking each node in turn (and unlocking that node's predecessor) until you reach the predecessor of the node to delete. — John Bollinger, Jun 07 '23 at 02:18
Also, @AndrewHenle, in this particular case, it is always the first node that is deleted, so the predecessor is the dummy head node or an equivalent, which cannot ever be deleted (but can and must still be locked). — John Bollinger, Jun 07 '23 at 02:22
Thanks for answer. Ido not want to NOT use mutex between thread. This I am already doing. Example function calls are in the comments. What Id like to guarantee is that both thread will be forced to call LDR instruction to load new value pointed to by list start, to make sure all threads all the time (whenever lock allows) see full list of data, and not some register cache. — unalignedmemoryaccess, Jun 07 '23 at 02:43
@unalignedmemoryaccess, this answer suggests a lighter-weight alternative to mutexes, but as it already explains, there is no safe alternative to using some kind of mutex-like locking. This is easier to see at the C-language level than at the assembly level, which is, after all, one of the reasons to use high(er) level languages. — John Bollinger, Jun 07 '23 at 12:14
Nevertheless, @unalignedmemoryaccess, if you want to ensure usage of a particular instruction, then your compiler probably supports some dialect of inline assembly. And, of course, you are free to take complete control by writing the whole thing in assembly. But I don't think even full-assembly would provide a way to do the job safely without some variation on locking. — John Bollinger, Jun 07 '23 at 12:20
@JohnBollinger there is no problem of no locking - certainly there is a mutex_lock and mutex_unlock before and after modification of linked list. The trick I want to make sure, is that how to ensure (without volatile?) that when we execute for loop, values to the first entry is always loaded with latest value of list pointer. And to avoid that first entry is put to one of the registers and that stay there forever. — unalignedmemoryaccess, Jun 07 '23 at 19:15
For instance this code in his example: `do { lock_mytype(e); int data = e->data; unlock_mytype(e); } while (data != 3);` How to ensure that `e->data` is always loaded from memory and not from register. Because it can get changed in another thread and not in the sequential code. — unalignedmemoryaccess, Jun 07 '23 at 19:16
I'm not sure I follow, @unalignedmemoryaccess. Are you saying that we should ignore that the mutex lock and unlock calls in your example code have been commented out? If so, then that was unnecessarily confusing. Especially because it moots the question. If you are using a mutex in such a way as to avoid any data races, then the problem you seem to be positing does not arise. Writes performed by one thread before releasing the mutex are visible to the other after it acquires the same mutex (or the mutex implementation is flawed to the point of uselessness). No `volatile` required. — John Bollinger, Jun 07 '23 at 19:29
And, if you are using C11 mutexes, as described in the language specification, then an implementation that fails to provide those memory semantics is not just useless, but non-conforming. — John Bollinger, Jun 07 '23 at 19:34
I may have mislead you, sorry for that - not by intention. But the point I try to understand is how to ensure, that there will be a LOAD instruction from memory for every loop, and not compiler assuming it can load it only before while loop and never load it from memory again. I am not using C11 mutexes. For embedded, these are mostly FreeRTOS or similar OS. — unalignedmemoryaccess, Jun 07 '23 at 19:56
Again, @unalignedmemoryaccess, if you are in fact using mutexes in the first place, then ensuring correct memory semantics is part of their *point*. I cannot rule out that some random mutex implementation fails to do that, but neither can I rule out that any other aspect of any given implementation is incorrect -- the behavior of atomic or `volatile` objects, say. If your program has data races then you might see the kind of misbehavior I now think you're asking about, but if your mutex use is sufficient to avoid data races, then you should not. — John Bollinger, Jun 07 '23 at 20:11
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/253997/discussion-between-unalignedmemoryaccess-and-john-bollinger). — unalignedmemoryaccess, Jun 08 '23 at 05:14

Linkedlist access multi-threaded application

1 Answers1