7

Is there a problem with multiple threads using the same integer memory location between pthreads in a C program without any synchronization utilities?

To simplify the issue,

  • Only one thread will write to the integer
  • Multiple threads will read the integer

This pseudo-C illustrates what I am thinking

void thread_main(int *a) {
  //wait for something to finish
  //dereference 'a', make decision based on its value
}

int value = 0;

for (int i=0; i<10; i++)
  pthread_create(NULL,NULL,thread_main,&value);
}
// do something
value = 1;

I assume it is safe, since an integer occupies one processor word, and reading/writing to a word should be the most atomic of operations, right?

Mike
  • 58,961
  • 76
  • 175
  • 221
  • 2
    Synchronization and mutual exclusion are subtly different. Your code guarantees mutual exclusion because of the atomic load/store of the processor word. However, you have not highlighted any synchronization (what happens before/after what) requirements in your question. – Sonny Saluja Jan 03 '11 at 22:27

7 Answers7

12

Your pseudo-code is NOT safe.

Although accessing a word-sized integer is indeed atomic, meaning that you'll never see an intermediate value, but either "before write" or "after write", this isn't enough for your outlined algorithm.

You are relying on the relative order of the write to a and making some other change that wakes the thread. This is not an atomic operation and is not guaranteed on modern processors.

You need some sort of memory fence to prevent write reordering. Otherwise it's not guaranteed that other threads EVER see the new value.

Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
  • 2
    On x86 and any platform with sane memory ordering requirements, or on a non-SMP machine where all threads take turns running on a single cpu, making the variable `volatile` is sufficient. Another approach would be writing the variable via an external function which takes a pointer to the variable and its new value. This will prevent reordering, and on pathological architectures, you could add a memory fence/barrier operation to the cpu-specific implementation of this function. – R.. GitHub STOP HELPING ICE Jan 03 '11 at 23:11
  • I'll clarify. This would work fine on x86 using gcc to compile. If the optimizer used register optimizations on 'value' ... use volatile, though gcc avoids reg opt with code accessing global variables. Even if you put mutexes and locks around 'value' the compiler can still do register optimization. Furthermore, the race condition to acquire the lock is the same as the race condition to read value. Finally, you are suggesting extra code to force the atomicity of an operation that is already atomic! – johnnycrash Apr 14 '11 at 16:44
  • 2
    @jonnycrash: "race condition to acquire the lock"? Where did I say anything about locking? And you're still focused on atomicity, even after my answer clearly explained that the issue here is ordering, not atomicity. Finally, since the pseudo-code makes no suggestion of volatile, it is wrong (even on x86, note that the question never said the code would only run on x86). Finally, your "fix" is compiler-specific. The C++ standard doesn't require a memory-fence for volatile access, and the C++ memory model deals with sequential consistency, it doesn't (yet) cover parallel execution. – Ben Voigt Apr 14 '11 at 16:58
  • [Visual C++ "volatile" creates a memory fence since VC++2005](http://msdn.microsoft.com/en-us/library/ms684208.aspx), maybe gcc does as well, but earlier versions did not. Plus, `volatile` is overkill and eliminates safe and valid optimizations in a way that an explicit memory barrier would not. – Ben Voigt Apr 14 '11 at 17:00
  • I have to apologize for the -1 and figure out how to undo it. I think my confusion is that you are correct the **CPU** can re-order the instructions and execute 'value=1' before another pipe finishes the loop, but in real life this happens extremely rarely. So my exhaustive tests on our thread library have never exposed this. What you mean by memory fence is __sync_synchronize, which would be place before the line 'value = 1'. I mentioned x86 b/c I had not tested this on other arch. I mistook "reorder" for compiler reordering, hence volatile. – johnnycrash Apr 14 '11 at 19:50
  • @ben. I misinterpreted your answer as "you can't ever read/write global variables without a lock. I have 2 questions. If the CPU re-ordered instructions how can that prevent threads from ever seeing the new value? I can understand seeing the new value earlier or later than intended, but not that value would never change to 1. Second, where can I read about memory fences in a way that makes it understandable? I see that there are different types (full, acquire, release, etc). I would like to know for example which type is minimally required here. – johnnycrash Apr 14 '11 at 19:55
  • @ben. I gave gave you +1 on a couple other answers to compensate for the -1. – johnnycrash Apr 14 '11 at 19:57
  • @Johnny: For another thread to never see the new value in unusual in code that has good design in other ways, since sooner or later you call an OS function that includes a barrier. So let's assume a poor design, and in particular a busy-wait in `thread_main`. `while (!a);` If `a` is not declared volatile, then the compiler is permitted to hoist the condition outside the loop, since the loop body cannot change it: `if (!a) while (true);`. A practical design would call an OS wait function, though, and then the compiler could not prove that `a` wasn't modified inside the loop body. – Ben Voigt Apr 14 '11 at 22:37
  • @Johnny: Ordering also is usually not a problem, but consider if the busy-wait is "fixed" with `volatile a;`. If the statement before the busy-wait and the code after the busy wait both use variables in the same cache line, then the CPU will load the whole cache-line before the wait loop. The compiler could even choose to use a wider memory fetch and load two variables at once. Flow analysis shows that the value of the other variable hasn't been modified in the loop, so the code after the loop can use the previously read value, now out of date. – Ben Voigt Apr 14 '11 at 22:42
  • @Johnny: Practically speaking again, you use an OS wait function, which not only forces the compiler to reload `a` but also produces memory fences to synchronize caches and limit reordering in the pipeline. And I too wish I understood the different between different memory fences. – Ben Voigt Apr 14 '11 at 22:43
1

Unlike java where you explicitly start a thread, posix threads start executing immediatelly.
So there is no guarantee that the value you set to 1 in main function (assuming that is what you refer in your pseudocode) will be executed before or after the threads try to access it.
So while it is safe to read the integer concurrently, you need to do some synchronization if you need to write to the value in order to be used by the threads.
Otherwise there is no guarantee what is the value they will read (in order to act depending on the value as you note).
You should not be making assumptions on multithreading e.g.that there is some processing in each thread befor accessing the value etc.
There are no guarantees

Cratylus
  • 52,998
  • 69
  • 209
  • 339
0

I wouldn't count on it. The compiler may emit code that assumes it knows what the value of 'value' is at any given time in a CPU register without re-loading it from memory.

bot403
  • 2,132
  • 15
  • 14
  • Does it? Is it necessary to use `volatile` on any shared variables, or are the compilers smart enough to determine that? – Kos Jan 03 '11 at 22:28
  • `volatile` affects caching values in registers, but on at least some compilers it doesn't affect the processor's internal write-reordering which can be equally fatal. – Ben Voigt Jan 03 '11 at 22:29
  • @Kos: It is not practical for the compiler to determine that a variable is externally modified, because the compiler knows nothing about threads - it cannot know that any two functions will be running concurrently. It could always assume worst case and never optimise away a memory read, but that rather defeats the object of optimisation! Even if it were thread aware, a variable modified in a separate compilation unit would not be seen by the compiler. The amount of static analysis required to *prove* concurrent access in any moderately complex system is *huge*. – Clifford Jan 03 '11 at 22:47
  • Which gives that *on top of* normal thread synchronisation issues I should also pay attention to make *every* shared variable volatile? Does this apply to OpenMP also? – Kos Jan 03 '11 at 22:56
  • 1
    Note that `volatile` is not necessary if you use pthreads because the `pthread_mutex_lock` and related functions are specified such that the compiler, when compiling the caller, must assume the lock operation could have modified any variable in the program that's reachable. – R.. GitHub STOP HELPING ICE Jan 03 '11 at 23:13
0

EDIT: Ben is correct (and I'm an idiot for saying he wasn't) that there is the possibility that the cpu will re-order the instructions and execute them down multiple pipelines at the same time. This means that the value=1 could possibly get set before the pipeline performing "the work" finished. In my defense (not a full idiot?) I have never seen this happen in real life and we do have an extensive thread library and we do run exhaustive long term tests and this pattern is used throughout. I would have seen it if it were happening, but none of our tests ever crash or produce the wrong answer. But... Ben is correct, the possibility exists. It is probably happening all the time in our code, but the re-ordering is not setting flags early enough that the consumers of the data protected by the flags can use the data before its finished. I will be changing our code to include barriers, because there is no guarantee that this will continue to work in the wild. I believe the correct solution is similar to this:

Threads that read the value:

...
if (value)
{
  __sync_synchronize();  // don't pipeline any of the work until after checking value
  DoSomething();
}
...

The thread that sets the value:

...
DoStuff()
__sync_synchronize();  // Don't pipeline "setting value" until after finishing stuff
value = 1;  // Stuff Done
...

That being said, I found this to be a simple explanation of barriers.

COMPILER BARRIER Memory barriers affect the CPU. Compiler barriers affect the compiler. Volatile will not keep the compiler from re-ordering code. Here for more info.

I believe you can use this code to keep gcc from rearranging the code during compile time:

#define COMPILER_BARRIER() __asm__ __volatile__ ("" ::: "memory")

So maybe this is what should really be done?

#define GENERAL_BARRIER() do { COMPILER_BARRIER(); __sync_synchronize(); } while(0)

Threads that read the value:

...
if (value)
{
  GENERAL_BARRIER();  // don't pipeline any of the work until after checking value
  DoSomething();
}
...

The thread that sets the value:

...
DoStuff()
GENERAL_BARRIER();  // Don't pipeline "setting value" until after finishing stuff
value = 1;  // Stuff Done
...

Using GENERAL_BARRIER() keeps gcc from re-ordering the code and also keeps the cpu from re-ordering the code. Now, I wonder if gcc wont re-order code over its memory barrier builtin, __sync_synchronize(), which would make the use of COMPILER_BARRIER redundant.

X86 As Ben points out, different architectures have different rules regarding how they rearrange code in the execution pipelines. Intel seems to be fairly conservative. So the barriers might not be required nearly as much on Intel. Not a good reason to avoid the barriers though, since that could change.

ORIGINAL POST: We do this all the time. its perfectly safe (not for all situations, but a lot). Our application runs on 1000's of servers in a huge farm with 16 instances per server and we don't have race conditions. You are correct to wonder why people use mutexes to protect already atomic operations. In many situations the lock is a waste of time. Reading and writing to 32 bit integers on most architectures is atomic. Don't try that with 32 bit bit-fields though!

Processor write re-ordering is not going to affect one thread reading a global value set by another thread. In fact, the result using locks is the same as the result not without locks. If you win the race and check the value before its changed ... well that's the same as winning the race to lock the value so no-one else can change it while you read it. Functionally the same.

The volatile keyword tells the compiler not to store a value in a register, but to keep referring to the original memory location. this should have no effect unless you are optimizing code. We have found that the compiler is pretty smart about this and have not run into a situation yet where volatile changed anything. The compiler seems to be pretty good at coming up with candidates for register optimization. I suspect that the const keyword might encourage register optimization on a variable.

The compiler might re-order code in a function if it knows the end result will not be different. I have not seen the compiler do this with global variables, because the compiler has no idea how changing the order of a global variable will affect code outside of the immediate function.

If a function is acting up, you can control the optimization level at the function level using __attrribute__.

Now, that said, if you use that flag as a gateway to allow only one thread of a group to perform some work, that wont work. Example: Thread A and Thread B both could read the flag. Thread A gets scheduled out. Thread B sets the flag to 1 and starts working. Thread A wakes up and sets the flag to 1 and starts working. Ooops! To avoid locks and still do something like that you need to look into atomic operations, specifically gcc atomic builtins like __sync_bool_compare_and_swap(value, old, new). This allows you to set value = new if value is currently old. In the previous example, if value = 1, only one thread (A or B) could execute __sync_bool_compare_and_swap(&value, 1, 2) and change value from 1 to 2. The losing thread would fail. __sync_bool_compare_and_swap returns the success of the operation.

Deep down, there is a "lock" when you use the atomic builtins, but it is a hardware instruction and very fast when compared to using mutexes.

That said, use mutexes when you have to change a lot of values at the same time. atomic operations (as of todayu) only work when all the data that has to change atomicly can fit into a contiguous 8,16,32,64 or 128 bits.

johnnycrash
  • 5,184
  • 5
  • 34
  • 58
  • "Processor write re-ordering is not going to affect one thread reading a global value set by another thread." Actually, that's exactly what it does affect. Well, one thread reading multiple values set by other thread(s). The point is, when the worker thread sees that `a` is true, it can't trust the value of any other state unless a memory barrier was used. – Ben Voigt Apr 14 '11 at 17:06
  • @Ben Thanks. I edited my post to make that clear now. How does that look? – johnnycrash Apr 14 '11 at 22:25
  • 1
    I think the reader needs to put the fence after testing `value` and before using the variables which `value` is being used to synchronize. A couple things to note: x86 CPUs have much stronger ordering guarantees than the C++ standard provides, and many OS functions include a memory barrier. So it's entirely possible that your existing code would never fail (but due to implementation details rather than any guarantee, and those details can change on future platforms). – Ben Voigt Apr 14 '11 at 22:44
  • @johnny: And you don't have to insult yourself. Phrasing like "Ben pointed out something I hadn't thought of" can be much more effective that words like "idiot". – Ben Voigt Apr 14 '11 at 22:48
  • @johnny: I can't be sure that this holds for every compiler, but I would think that the compiler doesn't know if `__sync_synchronize` touches global variables, and therefore can't reorder things past it. Or it recognizes it and treats it specially, and still won't reorder things past it. Function calls are barriers to compiler reordering unless they are inline functions (of course local variables whose address is never taken can be reordered, but this is always safe). So I think `__sync_synchronize` (or Win32 [MemoryBarrier](http://msdn.microsoft.com/en-us/library/ms684208.aspx)) is enough. – Ben Voigt Apr 15 '11 at 04:33
-1

Hm, I guess it is secure, but why don't you just declare a function that returns the value to the other threads, as they will only read it?

Because the simple idea of passing pointers to separate threads is already a security fail, in my humble opinion. What I'm telling you is: why to give a (modifiable, public accessible) integer address when you only need the value?

Giuliano
  • 172
  • 9
  • 1
    That function has to get the value from somewhere. Moving the read from the thread main function into some helper function doesn't change anything (unless the helper is in a different compilation unit and whole-program-optimization is disabled, in which case the inability to optimize across the function call could have an effect). – Ben Voigt Jan 03 '11 at 22:27
  • Doesn't is at least avoid to throw pointers away? And that helps too if you have to move or change the memory address of the data. If you have pointers on n instances, you'll have to send them the new address one by one. By the wat, I got your point :) – Giuliano Jan 03 '11 at 22:44
-1

Assume the first thing you're doing in thread func in sleeping for a second. So value after that will be definetly 1.

crazylammer
  • 1,152
  • 8
  • 7
-1

In any instant you should at least declare the shared variable volatile. However you should in all cases prefer some other form of thread IPC or synchronisation; in this case it looks like a condition variable is what you actually need.

Clifford
  • 88,407
  • 13
  • 85
  • 165