1

I'm new at multi-threaded programming and I tried to code the Bakery Lock Algorithm in C.

Here is the code:

int number[N];  // N is the number of threads                                                          
int choosing[N];                

void lock(int id)  {                                                                       
  choosing[id] = 1;                                                         
  number[id] = max(number, N) + 1;                                       
  choosing[id] = 0;                                                         

  for (int j = 0; j < N; j++)                                                   
    {                                                                       
      if (j == id)                                                          
        continue;                                                           

      while (1)                                                             
          if (choosing[j] == 0)                                             
            break;                                                          

      while (1)                                                             
        {                                                                   
          if (number[j] == 0)                                               
            break;                                                          
          if (number[j] > number[id]                                        
              || (number[j] == number[id] && j > id))                       
            break;                                                          
        }                         
    }
}

void unlock(int id)  {
   number[id] = 0;
}

Then I run the following example. I run 100 threads and each thread runs the following code:

  for (i = 0; i < 10; ++i)  {      
      lock(id);                                                                   
      counter++;
      unlock(id);                                              
    }                                                                       

After all threads have been executed, the result of the shared counter is 10 * 100 = 1000 which is the expected value. I executed my program multiple times and the result was always 1000. So it seems that the implementation of the lock is correct. That seemed weird based on a previous question I had because I didn't use any memory barriers/fences. Was I just lucky?

Then I wanted to create a multi-threaded program that will use many different locks. So I created this (full code can be found here):

typedef struct {                                                            
  int number[N];                                                            
  int choosing[N];                                                          
} LOCK;      

and the code changes to:

void lock(LOCK l, int id)                                                        
{                                                                           
  l.choosing[id] = 1;                                                                                                          
  l.number[id] = max(l.number, N) + 1;                                                                                            
  l.choosing[id] = 0;                 
...

Now when executing my program, sometimes I get 997, sometimes 998, sometimes 1000. So the lock algorithm isn't correct.

What am I doing wrong? What can I do in order to fix it?

Is it perhaps a problem now that I'm reading arrays number and choosing from a struct and that's not atomic or something?

Should I use memory fences and if so at which points (I tried using asm("mfence") in various points of my code, but it didn't help)?

Community
  • 1
  • 1
Fooko R.
  • 129
  • 1
  • 10
  • I hope this is just an academic exercise and not something you're actually trying to use... – R.. GitHub STOP HELPING ICE Feb 10 '12 at 23:43
  • @R.. I wouldn't use in practice Bakery's lock in practice :). It's not an exercise either. I'm just trying to get used on how to work with multi-threaded programs. – Fooko R. Feb 10 '12 at 23:44
  • @Fooko R. If so, don't implement primitives like locking yourself. Use standard, working and proven APIs/libraries. – nos Feb 11 '12 at 00:07
  • That too. My point was that this bakery lock is so inefficient and unscalable you shouldn't do it. Most people who try to roll their own locking are unhappy with the performance of system-library-provided locks, so I could see it making sense to roll your own low-level locking with atomic types (especially if you have C11 to work with), but writing a lock that's always going to be tens or hundreds of times slower than the real system-provided locks makes no sense. – R.. GitHub STOP HELPING ICE Feb 11 '12 at 00:16

1 Answers1

1

With pthreads, the standard states that accessing a varable in one thread while another thread is, or might be, modifying it is undefined behavior. Your code does this all over the place. For example:

  while (1)                                                             
      if (choosing[j] == 0)                                             
        break;

This code accesses choosing[j] over and over while waiting for another thread to modify it. The compiler is entirely free to modify this code as follows:

int cj=choosing[j];
while(1)
    if(cj == 0)
       break;

Why? Because the standard is clear that another thread may not modify the variable while this thread may be accessing it, so the value can be assumed to stay the same. But clearly, that won't work.

It can also do this:

while(1)
{
   int cj=choosing[j];
   if(cj==0) break;
   choosing[j]=cj;
}

Same logic. It is perfectly legal for the compiler to write back a variable whether it has been modified or not, so long as it does so at a time when the code could be accessing the variable. (Because, at that time, it's not legal for another thread to modify it, so the value must be the same and the write is harmless. In some cases, the write really is an optimization and real-world code has been broken by such writebacks.)

If you want to write your own synchronization functions, you have to build them with primitive functions that have the appropriate atomicity and memory visibility semantics. You must follow the rules or your code will fail, and fail horribly and unpredictably.

David Schwartz
  • 179,497
  • 17
  • 214
  • 278
  • Or use C11 or compiler-specific-pre-C11 atomics. – R.. GitHub STOP HELPING ICE Feb 11 '12 at 00:13
  • 1
    @R.. The problem is, I don't know of any platform where `volatile` is guaranteed to work. And the process of confirming that it did actually work is *way* more trouble than it's worth. It might *appear* to work, and in some circumstances, that might be good enough. – David Schwartz Feb 11 '12 at 00:19
  • @DavidSchwartz thanks for answering. But If I use for example `asm("mfence":::"memory")` won't I avoid such memory reorderings you give in your examples? – Fooko R. Feb 11 '12 at 00:24
  • And something more, if that's undefined behaviour how are all the concurrent data structures created? E.g. in a list you won't be able to read `head` of the list because some other thread might changing it (doing an insertion). – Fooko R. Feb 11 '12 at 00:33
  • @FookoR. You use locks, atomic operations, or some other synchronization device that provides the semantics you need. (In your case, mutexes and condition variables would would, as would atomic operations.) – David Schwartz Feb 11 '12 at 00:58
  • @DavidSchwartz I think I should use atomic operations, are you aware of any good book/tutorial/web page about atomic operations when using GCC, Linux? – Fooko R. Feb 11 '12 at 01:04
  • @FookoR. I'd start with the [GCC documentation on atomics](http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html) for reference and this article on [Multithreaded simple data type access and atomic variables](http://www.alexonlinux.com/multithreaded-simple-data-type-access-and-atomic-variables) to fill in any gaps. – David Schwartz Feb 11 '12 at 01:09
  • @DavidSchwartz So I tried doing the following in the code. I defined `#define READ(X) __sync_fetch_and_add(&(X), 0)` and `#define WRITE(X, V) __sync_bool_compare_and_swap(&(X), X, V)` and whenever I'm reading/writing a shared variable (like `l.number`) from a thread I use the defined `READ`s and `WRITE`s. Writes to `choosing[id]` or `number[id]` can only be done by thread with id: `id` so `CAS` inside the `WRITE` should always succeed. Anyway, the code keeps not working (:S). I also used `asm("mfence")` to avoid reordering of the writes. Here is the code: http://bit.ly/zPMDe9 If you can help – Fooko R. Feb 11 '12 at 03:16
  • @DavidSchwartz would **really** appreciate it. – Fooko R. Feb 11 '12 at 03:16
  • Why are you passing `lock` to the `Lock` and `Unlock` functions by value? Why does `Init` modify a lock and then throw it away? – David Schwartz Feb 11 '12 at 03:24
  • @DavidSchwartz Ooops (that was a silly mistake of me)! Thank you :) . Now it works!!! Thank you very much. – Fooko R. Feb 11 '12 at 03:35