Multi Threading - Peterson's algorithm not working

Question

Here I use Peterson's algorithm to implement mutual exclusion.

I have two very simple threads, one to increase a counter by 1, another to reduce it by 1.

const int PRODUCER = 0,CONSUMER =1;
int counter;
int flag[2];
int turn;

void *producer(void *param)
{

    flag[PRODUCER]=1;
    turn=CONSUMER;
    while(flag[CONSUMER] && turn==CONSUMER);

    counter++;

    flag[PRODUCER]=0;
}

void *consumer(void *param)
{
    flag[CONSUMER]=1;
    turn=PRODUCER;
    while(flag[PRODUCER] && turn==PRODUCER);

    counter--;

    flag[CONSUMER]=0;
}

They works fine when I just run them once.

But when I run them again again in a loop, strange things happen.

Here is my main function.

int main(int argc, char *argv[])
{
    int case_count =0;
    counter =0;
    while(counter==0)
    {
        printf("Case: %d\n",case_count++);
        pthread_t tid[2];
        pthread_attr_t attr[2];

        pthread_attr_init(&attr[0]);
        pthread_attr_init(&attr[1]);

        counter=0;
        flag[0]=0;
        flag[1]=0;
        turn = 0;

        printf ("Counter is intially set to %d\n",counter);

        pthread_create(&tid[0],&attr[0],producer,NULL);
        pthread_create(&tid[1],&attr[1],consumer,NULL);

        pthread_join(tid[0],NULL);
        pthread_join(tid[1],NULL);

        printf ("counter is now %d\n",counter);
    }

    return 0;
}

I run the two threads again and again, until in one case the counter isn't zero.

Then, after several cases, the program will always stop! Some times after hundreds of cases, some times thousands, or event tens of thousand.

It means in one case the counter isn't zero. But why??? the two threads modify the counter in critical session, and increase and decrease it only once. Why will the counter not be zero?

Then I run this code in other computers, more strange things happen - in some computers the program seems has no problem, and the others have the same problem with me! Why?

By the way, in my computer, I run this code in VM ware's virtual computer, Ubuntu 16.04. Others' computer is also Ubuntu 16.04, but not all of them are in virtual machines. And the computer with problem contains both virtual machines and real machines.

I took only a short glance but: Two threads access shared non-atomic variables without mutex guardiance - that may cause non-determinism. Now, additional effects come into play (i.e. random effects). On which core does each thread run? Do they run in real or pseudo concurrence? Are the caches synchronized or not? Are variables buffered in (different) registers? Is the execution order of (for the system independent) statements changed? These are all questions you cannot answer. — Scheff's Cat, Nov 23 '17 at 13:23
This might be interesting: [SO: What is a race condition?](https://stackoverflow.com/questions/34510/what-is-a-race-condition). — Scheff's Cat, Nov 23 '17 at 13:31
@Scheff Yes, you are right, I think. Tow thread both modify the variable `turn`. I add a lock to `turn`'s modification and everything goes fine. But does it means Peterson's algorithm is wrong? — Lixin Wei, Nov 23 '17 at 13:34
"But does it means Peterson's algorithm is wrong?" I don't believe so. He probably didn't give the code in C or C++. In other words, he assumed/defined additional grants which are not provided by C or C++ as you used it. Take a look at [atomic](http://en.cppreference.com/w/cpp/atomic). This is also nice: [Race Condition vs. Data Race](https://blog.regehr.org/archives/490). I'm still looking for a slide show (in google) which I used once to clear my mind regarding this. — Scheff's Cat, Nov 23 '17 at 13:39
...finally found it: [Things You Never Wanted to Know about Memory Fences](http://passthrough.fw-notify.net/download/426763/http://nwcpp.org/talks/2008/Memory_Fences.pdf). — Scheff's Cat, Nov 23 '17 at 13:56

score 2 · Answer 1 · answered Nov 23 '17 at 13:35

You need hardware support to implement any kind of thread-safe algorithm.

There are many reasons why your code is not working at you intended. The simplest one is that the cores have individual caches. So your program starts on say two cores. Both cache flag to be 0, 0. They both modify their own copy, so they don't see what the other core is doing.

In addition memory works in blocks, so writing flag[PRODUCER] will very likely write flag[CONSUMER] as well (because ints are 4 bytes and most of todays processors have memory blocks of 64 bytes).

Another problem would be operation reordering. Both the compiler and the processor are allowed to swap instructions. There are constraints that dictate that the single threaded execution result shouldn't change, but obviously they don't apply here.

The compiler might also figure out that you are setting turn to x and then checking if it is x, which is obviously true in a single threaded world so it can be optimized away.

This list is not exhaustive. There are many more things (some platform specific) that could happen and break your program.

So, at the very least try to use std::atomic types with strong memory ordering (memory_order_seq_cst). All your variables should be std::atomic. This gives you hardware support but it will be a lot slower.

This will still not work because most you might still have some piece of code where you read and then change. This is not atomic because some other thread might have changed the data after your read and before you changed it.

Thanks a lot. But I still have some questions. 1. Doesn't CPU has any data sync measure to ensure one core won't read some data from cache that is already modified by another core?(I think it must has such measures, or everything goes wrong). 2. I compile my code with `-O0`, it seems that I close all optimisations made by compiler. So what exactly make Peterson's algorithm doesn't work? Would you please give me a specific situation? — Lixin Wei, Nov 23 '17 at 16:20
@LixinWei 1. Yes it does. You use it with atomic and memory_order specifications. This is turned off by default because data today is accessed from a single thread and it gives you a HUGE boost in speed. — Sorin, Nov 24 '17 at 08:42
@LixinWei 2. Everything other than compiler reordering and some optimization, will still happen. You set turn to PRODUCER but that doesn't mean that the write goes into main memory. Most likely it will just apply to the L1 (core specific) cache and will be copied to ram later (Unless you use atomic and memory order correctly). You set flag[0] but when the block is copied back to ram it will overwrite flag[1] as well because it's likely in the same block. — Sorin, Nov 24 '17 at 08:48
@LixinWei under O0 count ++ is translated into a read modify write operation. Even with a proper mutex if memory access order is not enforced it can still go wrong because the write was not propagated to main memory or the other core in time. — Sorin, Nov 24 '17 at 08:51

Detonar · Accepted Answer · 2017-11-24T10:06:58.147

2

Peterson's algorithm only works on single core processors/single CPU systems.

That's because they don't do real parallel processing. Two atomar operations never get executet at the same time there.

If you got 2 or more CPUs/CPU cores the amount of atomar operations who can be executed at the same time increase by one for each cpu(core). This means, even if an integer assignment is atomar it can be executed multiple times at the same time in different CPUs/Cores.

In your case turn=CONSUMER/PRODUCER; is just called twice at the same time in different CPUs/cores.

Deacitvate all CPU cores but one for your program and it should work fine.

edited Nov 24 '17 at 10:06

answered Nov 23 '17 at 13:45

Detonar

1,409
6
18

Would you please explain more specifically why Peterson's algorithm won't work on multi processors system? Why an atomar operations will be executed in both CPU cores? – Lixin Wei Nov 23 '17 at 15:58
While executing an atomar command the CPU Core does not perform any context changes until the comand is finished. That's why peterson's algorithm works. If you got 2 threads and if every thread is executed in another core they can run at the same time. Therefore every thread can excute his own commands indepentend from the other which means that if both threads in different cores execute the same line of atomar code at the same time they just behave as if they get exchanged by the scheduler in an non-atomic instruction. – Detonar Nov 24 '17 at 07:13
It's the very purpose of multicore-processors to execute code indepentend of each other. That's why they are so fast. But from this moment you need cross-CPU synchronisation mechanisms like semaphores or technologies utlilzing them like mutexes. – Detonar Nov 24 '17 at 07:13
So, in my case. If `turn=CONSUMER/PRODUCER` is called at the same time, does it mean, in one core `turn` is 0 and the other is 1? And `counter++` will be called twice? But in my practice, the counter at last when program stopped is either -1 or 1, not 2, why? – Lixin Wei Nov 24 '17 at 09:34
I think something like this could happen (c=counter; rp=register of producer's CPU; rc=register of consumer's CPU): `counter`: `p: load counter(rp=c=0) in register`=>`c: load counter(rc=c=0) in register`=>`p: increment value in register(rp+1`=>`c: decrement value in register(rc+1`=>`p: write value in register to counter(c=rp=+)`=>`c: write value in register to counter(c=rc=-1)`. After this `counter? contains `-1`(or `1` if you swap producer and consumer in this example. – Detonar Nov 24 '17 at 10:02
In my example the call of `turn=CONSUMER/PRODUCER` is just the breaking point of peterson's algorithm resulting in more threadunsafe behavior like the one i mentioned in the comment above. – Detonar Nov 24 '17 at 10:06

Multi Threading - Peterson's algorithm not working

2 Answers2