I have a few questions regarding memory barriers.
Say I have the following C code (it will be run both from C++ and C code, so atomics are not possible) that writes an array into another one. Multiple threads may call thread_func()
, and I want to make sure that my_str
is returned only after it was initialized fully. In this case, it is a given that the last byte of the buffer can't be 0. As such, checking for the last byte as not 0, should suffice.
Due to reordering by compiler/CPU, this can be a problem as the last byte might get written before previous bytes, causing my_str
to be returned with a partially copied buffer. So to get around this, I want to use a memory barrier. A mutex will work of course, but would be too heavy for my uses.
Keep in mind that all threads will call thread_func()
with the same input, so even if multiple threads call init()
a couple of times, it's OK as long as in the end, thread_func()
returns a valid my_str
, and that all subsequent calls after initialization return my_str directly.
Please tell me if all the following different code approaches work, or if there could be issues in some scenarios as aside from getting the solution to the problem, I'd like to get some more information regarding memory barriers.
__sync_bool_compare_and_swap
on last byte. If I understand correctly, any memory store/load would not be reordered, not just the one for the particular variable that is sent to the command. Is that correct? if so, I would expect this to work as all previous writes of the previous bytes should be made before the barrier moves on.#define STR_LEN 100 static uint8_t my_str[STR_LEN] = {0}; static void init(uint8_t input_buf[STR_LEN]) { for (int i = 0; i < STR_LEN - 1; ++i) { my_str[i] = input_buf[i]; } __sync_bool_compare_and_swap(my_str, 0, input_buf[STR_LEN - 1]); } const char * thread_func(char input_buf[STR_LEN]) { if (my_str[STR_LEN - 1] == 0) { init(input_buf); } return my_str; }
__sync_bool_compare_and_swap
on each write. I would expect this to work as well, but to be slower than the first one.static void init(char input_buf[STR_LEN]) { for (int i = 0; i < STR_LEN; ++i) { __sync_bool_compare_and_swap(my_str + i, 0, input_buf[i]); } }
__sync_synchronize
before each byte copy. I would expect this to work as well, but is this slower or faster than (2)?__sync_bool_compare_and_swap
is supposed to be a full barrier as well, so which would be preferable?static void init(char input_buf[STR_LEN]) { for (int i = 0; i < STR_LEN; ++i) { __sync_synchronize(); my_str[i] = input_buf[i]; } }
__sync_synchronize
by condition. As I understand it,__sync_synchronize
is both a HW and SW memory barrier. As such, since the compiler can't tell the value ofuse_sync
it shouldn't reorder. And the HW reordering will be done only ifuse_sync
is true. is that correct?static void init(char input_buf[STR_LEN], bool use_sync) { for (int i = 0; i < STR_LEN; ++i) { if (use_sync) { __sync_synchronize(); } my_str[i] = input_buf[i]; } }