6

If I have some code that looks something like:

typedef struct {
    bool some_flag;

    pthread_cond_t  c;
    pthread_mutex_t m;
} foo_t;

// I assume the mutex has already been locked, and will be unlocked
// some time after this function returns. For clarity. Definitely not
// out of laziness ;)
void check_flag(foo_t* f) {
    while(f->flag)
        pthread_cond_wait(&f->c, &f->m);
}

Is there anything in the C standard preventing an optimizer from rewriting check_flag as:

void check_flag(foo_t* f) {
    bool cache = f->flag;
    while(cache)
        pthread_cond_wait(&f->c, &f->m);
}

In other words, does the generated code have to follow the f pointer every time through the loop, or is the compiler free to pull the dereference out?

If it is free to pull it out, is there any way to prevent this? Do I need to sprinkle a volatile keyword somewhere? It can't be check_flag's parameter because I plan on having other variables in this struct that I don't mind the compiler optimizing like this.

Might I have to resort to:

void check_flag(foo_t* f) {
    volatile bool* cache = &f->some_flag;
    while(*cache)
        pthread_cond_wait(&f->c, &f->m);
}
Clark Gaebel
  • 17,280
  • 20
  • 66
  • 93

4 Answers4

7

In the general case, even if multi-threading wasn't involved and your loop looked like:

void check_flag(foo_t* f) {
    while(f->flag)
        foo(&f->c, &f->m);
}

the compiler would be unable to to cache the f->flag test. That's because the compiler can't know whether or not a function (like foo() above) might change whatever object f is pointing to.

Under special circumstances (foo() is visible to the compiler, and all pointers passed to the check_flag() are known not to be aliased or otherwise modifiable by foo()) the compiler might be able to optimize the check.

However, pthread_cond_wait() must be implemented in a way that would prevent that optimization.

See Does guarding a variable with a pthread mutex guarantee it's also not cached?:

You might also be interested in Steve Jessop's answer to: Can a C/C++ compiler legally cache a variable in a register across a pthread library call?

But how far you want to take the issues raised by Boehm's paper in your own work is up to you. As far as I can tell, if you want to take the stand that pthreads doesn't/can't make the guarantee, then you're in essence taking the stand that pthreads is useless (or at least provides no safety guarantees, which I think by reduction has the same outcome). While this might be true in the strictest sense (as addressed in the paper), it's also probably not a useful answer. I'm not sure what option you'd have other than pthreads on Unix-based platforms.

Community
  • 1
  • 1
Michael Burr
  • 333,147
  • 50
  • 533
  • 760
  • I wish I could accept two answers, but I can't. I just picked the one with the lowest rep. They were equally helpful though! – Clark Gaebel Jan 14 '11 at 00:41
  • 1
    This is the best answer, but I like OP's rationale for accepting the other. :-) For what it's worth, pthread synchronization functions are specified to be full memory barriers. How this is implemented is none of the application's business; it's just guaranteed to work. – R.. GitHub STOP HELPING ICE Jan 14 '11 at 01:14
  • "_pthread_cond_wait() must be implemented in a way that would prevent that optimization._" I am curious how `pthread_cond_wait` could be reasonably implemented in a way that allows the (incorrect) optimisation! – curiousguy Oct 02 '11 at 04:21
3

Normally, you should try to lock the pthread mutex before waiting on the condition object as the pthread_cond_wait call release the mutex (and reacquire it before returning). So, your check_flag function should be rewritten like that to conform to the semantic on the pthread condition.

void check_flag(foo_t* f) {
    pthread_mutex_lock(&f->m);
    while(f->flag)
        pthread_cond_wait(&f->c, &f->m);
    pthread_mutex_unlock(&f->m);
}

Concerning the question of whether or not the compiler is allowed to optimize the reading of the flagfield, this answer explains it in more detail than I can.

Basically, the compiler know about the semantic of pthread_cond_wait, pthread_mutex_lock and pthread_mutex_unlock. He know that he can't optimize memory reading in those situation (the call to pthread_cond_wait in this exemple). There is no notion of memory barrier here, just a special knowledge of certain function, and some rule to follow in their presence.

There is another thing protecting you from optimization performed by the processor. Your average processor is capable of reordering memory access (read / write) provided that the semantic is conserved, and it is always doing it (as it allow to increase performance). However, this break when more than one processor can access the same memory address. A memory barrier is just an instruction to the processor telling it that it can move the read / write that were issued before the barrier and execute them after the barrier. It has finish them now.

Community
  • 1
  • 1
Sylvain Defresne
  • 42,429
  • 12
  • 75
  • 85
  • Does that mean that the compiler can't cache the value of `p->some_flag` in a register? I'm not sure of the implications of a memory barrier. Mind explaining them a bit? – Clark Gaebel Jan 14 '11 at 00:14
  • 3
    The inability of the compiler to cache the value has nothing to do with memory barriers or threads; it's simply a consequence of the fact that the call could modify the flag. From the compiler's perspective, it's the actual call to `pthread_cond_wait`, not some other thread, that might have modified the flag. – R.. GitHub STOP HELPING ICE Jan 14 '11 at 01:16
  • "_Basically, the compiler know about the semantic of pthread_cond_wait, pthread_mutex_lock and pthread_mutex_unlock._" Care to mention one compiler that do? I bet most compilers don't have any idea about the semantic of pthread_cond_wait, pthread_mutex_lock and pthread_mutex_unlock. – curiousguy Oct 02 '11 at 04:18
3

As written, the compiler is free to cache the result as you describe or even in a more subtle way - by putting it into a register. You can prevent this optimization from taking place by making the variable volatile. But that is not necessarily enough - you should not code it this way! You should use condition variables as prescribed (lock, wait, unlock).

Trying to do work around the library is bad, but it gets worse. Perhaps reading Hans Boehm's paper on the general topic from PLDI 2005 ("Threads Cannot be Implemented as a Library"), or many of his follow-on articles (which lead up to work on a revised C++ memory model) will put the fear of God in you and steer you back to the straight and narrow :).

EmeryBerger
  • 3,897
  • 18
  • 29
1

Volatile is for this purpose. Relying on the compiler to know about pthread coding practices seems a little nuts to me, although; compilers are pretty smart these days. In fact, the compiler probably sees that you are looping to test a variable and won't cache it in a register for that reason, not because it sees you using pthreads. Just use volatile if you really care.

Kind of funny little note. We have a VOLATILE #define that is either "volatile" (when we think the bug can't possibly be our code...) or blank. When we think we have a crash due to the optimizer killing us, we #define it "volatile" which puts volatile in front of almost everything. We then test to see if the problem goes away. So far... the bugs have been the developer and not the compiler! who'd have thought!? We have developed a high performance "non locking" and "non blocking" threading library. We have a test platform that hammers it to the point of thousands of races per second. So fare, we have never detected a problem needing volatile! So far gcc has never cached a shared variable in a register. yah...we are surprised too. We are still waiting for our chance to use volatile!

johnnycrash
  • 5,184
  • 5
  • 34
  • 58
  • "_compilers are pretty smart these days._" When did compiler assumed that a call to a function had no effect on anything? – curiousguy Oct 02 '11 at 04:29