C Threaded Programming - Incrementing a shared variable

Question

Hey guys...so I'm trying to brush up on my C threads and a question I've found is this:

Given a global variable int x = 0; implement the function void useless(int n) which creates n threads which in a loop increase x by 1, each thread terminates when x reaches 100.

I just don't have a handle on threads and need a solid example to base my foundation. This must use pthread system calls as much as possible.

Have you tried doing things like googling for pthreads example? — Cascabel, Oct 07 '09 at 18:42
You should also specify which operating system you're targeting. Unix/POSIX uses pthreads, MS/Windows uses its own API. — David R Tribble, Oct 07 '09 at 20:08
@Loadmaster "This must use pthread system calls as much as possible." seems pretty clear. — Pete Kirkham, Oct 07 '09 at 21:42

Pete Kirkham · Accepted Answer · 2009-10-08T07:25:22.537

First you need to decide what it is you're trying to achieve, and what ways possible the instructions of different threads can interlace an prevent that happening.

The incrementation operator in C ++x is usually implemented as read the value of i from memory into an register; increment the register; write the value to memory:

    r₁ ← x_global
    r₁ ← r₁ + 1
    x_global ← r₁

So the value of x_global is incremented by one.

If you have two threads in parallel, then they can interlace destructively

    initial x_global = 99

    r₁ ← x_global     
    compare r₁ 100 
                    r₂ ← x_global          
                    compare r₂ 100 
    r₁ ← r₁ + 1  == 100 
                    r₂ ← r₂ + 1 == 100
    x_global ← r₁    == 100
                    x_global ← r₂     == 100
    r₁ ← x_global     
    compare r₁ 100 
    stop
                    r₂ ← x_global     
                    compare r₁ 100 
                    stop
    final x_global = 100

So the value of x_global is incremented by one, despite both threads incrementing it.

( I'm eliding the effects of caching, which mean that the read of a variable in thread 2 can behave as though it took place before a write in thread 1 even if the write by thread 1 happens before the read in by wall clock time. Acquiring and releasing of mutexes in pthreads cause memory barriers which force all reads to behave as though they happened after and writes to behave as if they happened before the acquire or release. )

( the above is equivalent of for ( int r; ( r = x ) < 100; x = r + 1 ) rather than for (; x < 100; x = x + 1 ) which may have an extra read of x and so has another point where threads can interfere )

Similarly, an increment by one thread can destroy the increment of another thread allowing threads to end with i < 100:

    initial x_global = 98
                    r₂ ← x_global          
    r₁ ← x_global     
    compare r₁ 100 
    r₁ ← r₁ + 1  == 99
    x_global ← r₁     
    r₁ ← x_global     
    compare r₁ 100 
    r₁ ← r₁ + 1  == 100
    x_global ← r₁     
    r₁ ← x_global     
    compare r₁ 100 
    stop
                    compare r₂ 100 
                    r₂ ← r₂ + 1 == 99
                    x_global ← r₂      
                    ...
    final x_global = 99

So the second increment by the left thread is overwritten by the increment by the first, and it would terminate with the global visible value of x < 100.

You probably know all that, and may want to use a mechanism to protect against it.

I say 'may' as your requirements aren't clear - the thread above terminated when x reached 100; the requirements don't say it doesn't say there.

So, since no thread will terminate without writing x_global ← 100, the requirement may in fact be met without any locking, but x may be incremented n*100 times rather than 100 times. ( if the limit was larger than a byte then the writing of x might be non-atomic on some platforms, which could result in an infinite loop if bytes from different threads are mixed together, but for a limit of 100 that won't happen )

One technique is to use a mutex which blocks other threads from running when one thread holds the lock on the mutex. If the lock is acquired before x_global is read, and not released until x_global is written, then the reads and writes of the thread cannot interlace.

    initial x_global = 98

    lock (mutex) 
    mutex locked 
                    lock(mutex) 
                    blocked 
    r₁ ← x_global     
    compare r₁ 100 
    r₁ ← r₁ + 1  == 99
    x_global ← r₁     

    release ( mutex )
                    mutex locked

                    r₂ ← x_global          
                    compare r₂ 100 
                    r₂ ← r₂ + 1 == 100
                    x_global ← r₂      

                    release ( mutex )

    lock (mutex) 
    mutex locked 
    r₁ ← x_global     
    compare r₁ 100 
    release ( mutex )
    stop
                    ...
    final x_global = 100

Outside of pthreads, you might want to use your platform's compare-and-swap operation ( __sync_val_compare_and_swap in gcc ) which takes an address an old value and a new value, and atomically sets the memory at the address to the new value if it was equal to the old value. This lets you write the logic as:

for ( int v = 0; v < 100; ) {
    int x_prev = __sync_val_compare_and_swap ( &x, v, v + 1 );

    // if the CAS succeeds, the value of x has been set to is x_prev + 1
    // otherwise, try again from current last value
    if ( x_prev == v ) 
        v = x_prev + 1;
    else
        v = x_prev;
}

So if

    initial x_global = 98
    initial v₁  = 0
    initial v₂  = 0

    cmp v₁  100
    x_prev₁ ← CASV ( x_global, v₁, v₁ + 1 ) = 98 ( set fails with x == 98 )

                    cmp v₂  100
                    x_prev₂ ← CASV ( x_global, v₁, v₁ + 1 ) = 98 ( set fails with x == 98 )

    v₁ ← x_prev₁ = 98 // x_prev != v
                    v₂ ← x_prev₂ = 98
                    cmp v₂  100
                    x_prev₂ ← CASV ( x_global, v₁, v₁ + 1 ) = 98 ( set succeeds with x == 99 )

                    v₂ ← x_prev₂ + 1 = 99 // as x_prev == v

    cmp v₁  100
    x_prev₁ ← CASV ( x_global, v₁, v₁ + 1 ) = 99 ( set fails with x == 99 )
    v₁ ← x_prev₁ = 99 // as x_prev != v

    cmp v₁  100
    x_prev₁ ← CASV ( x_global, v₁, v₁ + 1 ) = 99 ( set succeeds with x == 100)
    v₁ ← x_prev₁ + 1 = 100 // as x_prev == v

                    cmp v₂  100
                    x_prev₂ ← CASV ( x_global, v₁, v₁ + 1 ) = 100 ( set fails with x == 100 )

                    v₂ ← x_prev₂  = 100 // as x_prev != v
    cmp v₁  100
                    cmp v₂  100
    stop
                    stop

On each loop, x_global will atomically be set to the value of r₁ + 1 if and only if its previous value was r₁; if not, r₁ will be set to the value of x_global tested during the CASV operation. This reduces the amount of time locks are held on most implementations ( though it still requires locking the memory bus for the duration of the CAS operation, only those operations will be serialised. As performing the CAS is expensive on multi-cores, it probably won't be much better for such a simple case as this. )

and in a few years we will have the C1x standard which will include atomics! (as well as what is clearly a pthreads inspired threading library) — Spudd86, Nov 25 '10 at 18:41

score 2 · Answer 2 · answered Oct 07 '09 at 18:47

2

You need a mutex to protect the variable. Each thread will lock the mutex, increment the variable and release the mutex. Each thread that doesn't do this is a rogue thread.

answered Oct 07 '09 at 18:47

Jonathan Leffler

730,956
141
904
1,278

1

I think that the InterlockedIncrement API doesn't use a mutext; instead it just uses a `LOCK:` opcode prefix. – ChrisW Oct 07 '09 at 18:52
I don't think InterlockedIncrement alone will cut it here, since you have to protect the compare against 100 as well as the increment. – Dan Olson Oct 07 '09 at 18:56
@ChrisW: can you provide a context for the InterlockedIncrement API? @Pete: InterlockedIncrement is not a standard part of C, nor is it a part of POSIX. The POSIX thread primitives start 'pthread_'. You can write one easily enough - but it is not standard. – Jonathan Leffler Oct 07 '09 at 19:48
@ChrisW: also, what is the 'LOCK:' prefix? That is a label in C - are you confusing C with C# perchance? – Jonathan Leffler Oct 07 '09 at 19:49
@Pete: InterlockedIncrement and simplings are part of Win32. Even if there is not a standard posix API for it in phreads most platforms have something similar. On Linux you have the atomic_add and siblings you can use, on Solaris it is atomic_add_32 and similar (man atomic_ops) and I bet you'll find them on most modern OS's (at least I have). If you want to be relatively platform independent, creating a simple .h file with defines mapping in the right implementation depending on the current platform is a piece of a cake as it is normally just a handful of calls and they work the same all over. – Fredrik Oct 07 '09 at 19:58
The LOCK prefix is applied to an x86 instruction to say the the processor bus is locked for the duration, making it into an atomic operation. – Pete Kirkham Oct 07 '09 at 19:59
I know there's an InterlockedIncrement in win32. The OP wanted to use POSIX pthreads. Apache Portable Runtime has a portable interlocked increment, but I think the unix version used mutexes rather than mapping to all the different flavours last time I looked. – Pete Kirkham Oct 07 '09 at 20:03
@Pete: the OP didn't say it had to be POSIX. He said "This must use pthread system calls as much as possible". As there are no atomic POSIX defined operations (that I know of), POSIX falls outside the "as much as possible" requirement. Using locks alone is not always a complete solution, you probably want the variable to be volatile as well. – Fredrik Oct 08 '09 at 21:07

score 0 · Answer 3 · edited Oct 07 '09 at 19:43

0

What you need is a critical section. Under windows, this would be EnterCriticalSection, but in the pthread environment, the equivalent is pthread_mutex_lock. See here for some pointers.

edited Oct 07 '09 at 19:43

Pete Kirkham

48,893
5
92
171

answered Oct 07 '09 at 19:07

Bob Moore

6,788
3
29
42

The markup also smartly allows you to use `backticks` to represent `__code__` which disables all other markup. Don't dis the SO markup unless you actually understand it. – Chris Lutz Oct 07 '09 at 19:37

score -1 · Answer 4 · answered Oct 07 '09 at 19:17

I would think that InterlockedIncrement is sufficient, if it is ok for each thread to exit if X >= 100.

I would never use a critical section unless I really have to as this can lead to a high level of contention. Whereas InterlockedIncrement has no contention at all, at least not that might affect over all performance.

C Threaded Programming - Incrementing a shared variable

4 Answers4

Linked