-2

The program in question is this one:

#include <pthread.h>
#include <stdio.h>

#define NUM_LOOPS 500000000
long long sum = 0;

/* Add or subtract the offset specified in arg from the sum NUM_LOOPS times */
void* counting_function(void* arg) {
    int offset = *(int*)arg;
    for (int i = 0; i < NUM_LOOPS; i++) {
        sum += offset;
    }
    pthread_exit(NULL);
}

int main(void) {
    // Spawn the threads
    pthread_t id1;
    int offset1 = 1;
    pthread_create(&id1, NULL, counting_function, &offset1);

    pthread_t id2;
    int offset2 = -1;
    pthread_create(&id2, NULL, counting_function, &offset2);

    // Wait for threads to finish
    pthread_join(id1, NULL);
    pthread_join(id2, NULL);

    printf("Sum = %lld\n", sum);

    return 0;
}

I have a global variable long long sum = 0 and a macro for NUM_LOOPS 500000000 that specifies the number to add up to.

The function void* counting_function(void* arg) just adds the offset to sum NUM_LOOPS times.

If I were to run this without threads but calling the functions one after another like this

// sum is 0
counting_function(1);
// sum is now equal to 500000000
countnig_function(-1); 
// sum is now equal to 0

I would get the correct answer, 0.

But when using threads, I get a non-zero answer, different every time.

gcc -Wall -Wextra -Werror -std=c99 -pthread count.c && ./a.out

Sum = 40098157
./a.out
Sum = 303575386

It seems to me that in whatever order you add or subtract 1 from a number, if the total amount of summations and subtractions is the same, then the answer should always be 0.

For example, let's say that NUM_LOOPS = 10. The first thread adds 1 three times in a row, then the second thread subtracts 1 two times in a row. The sum is now going to be equal to 1. Then the first thread does the remaining seven additions, so sum is equal to 8 and then the second thread subtracts the final eight times. The sum is 0.

How can you possibly get an answer that is different than 0? Having an answer different than 0 would mean that one of the two threads added or subtracted more than the specified amount in NUM_LOOPS.

The problem can be solved by adding mutual exclusion to the critical section sum += offset; inside the counting function, but I can't seem to figure out why it wouldn't work without mutex.

What am I missing?

kiraleos
  • 49
  • 1
  • 5
  • 1
    When a thread adds to `sum`, it does it in multiple steps: Load `sum` into a register. Add a value to the register. Store the value from the register into `sum`. (This might be two steps, such as load-and-add in one step and store in another step.) If a thread switch occurs during this, you can get an interleaving: Thread 1 loads `sum`, say with value 400 into a register and adds 1 to the register, getting 401. There is a switch, and thread 2 loads `sum`, still with value 400, into a register and adds −1, getting 399. Then it stores it, so `sum` is 399. Later, thread 1 stores 401. – Eric Postpischil Apr 17 '20 at 02:45
  • 1
    Thus, the decrement to 399 is lost; it was overridden by the delayed store of 401. – Eric Postpischil Apr 17 '20 at 02:45
  • @EricPostpischil - those two comments would make a fine answer. – David C. Rankin Apr 17 '20 at 02:54
  • @EricPostpischil Even though I had learnt this a while back, it didn't occur to me. Thank you for the answer! – kiraleos Apr 17 '20 at 02:56

4 Answers4

3

When a thread adds to sum, it does it in multiple steps: Load sum into a register. Add a value to the register. Store the value from the register into sum. (This might be two steps, such as load-and-add in one step and store in another step.) If a thread switch occurs during this, you can get an interleaving: Thread 1 loads sum, say with value 400 into a register and adds 1 to the register, getting 401. There is a switch, and thread 2 loads sum, still with value 400, into a register and adds −1, getting 399. Then it stores it, so sum is 399. Later, thread 1 stores 401.

Thus, the decrement to 399 is lost; it was overridden by the delayed store of 401.

And of course, thread 2 might run for a while before there is a switch back to thread 1, so there might be quite a number of decrements by thread 2 that are erased by the delayed store from thread 1.

What is likely to happen is a random selection of this interleaving effectively erasing various numbers of adds from one thread or another. Many of them will cancel—some erases of adds will cancel some erases of subtracts, and the results over a number of runs will show a distribution of sums centered on zero (or perhaps near it if the fact that one thread is started before the other skews the distribution).

That is only one problem that can happen with multithreading. Particularly since sum is long long it may be implemented in some target architectures using multiple machine words, so it has to be loaded in multiple steps and stored in multiple steps. Then multithreading can result in one thread storing part of a new value, another thread coming in and modifying sum, and the first thread finishing its storing of the other part. This can produce results more catastrophic than mere missing increments or decrements; it can corrupt the value of sum more profoundly. To avoid this, you want to use an atomic object.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
2

The pthreads library shoehorns parallelism into a language that was designed for single-threaded operation. I recommend studying memory models and how they apply to multi-threaded programming.

Have a look at the Threads and data races section here. There is an example relevant to your problem here.

This wikipedia article has a good section covering race conditions.

It's interesting to note that this problem predates pthreads. You can observe the same behaviour in a 'single-threaded' C program with a signal handler, where both the main program and the signal handler are updating the same variable.

rtx13
  • 2,580
  • 1
  • 6
  • 22
  • c++ is an example of how to do things right? Is this bizarro world? It can't even support pthreads properly; and is just waiting for that asteroid to seal its extinction. – mevets Apr 17 '20 at 02:55
1

Congratulations on trying it out! You’ve discovered that trying to update the same variable from two threads is buggy.

Some solutions you can try. The ones higher up the list are what you should use if possible.

  • Give each thread a separate counter, and add the counts together when the threads have returned.
  • Make the count an atomic variable. Increment and decrement it with atomic_fetch_add_explicit(). In this case, you can use memory_order_relaxed, since the order of updates doesn’t matter. On most processors today, that compiles to a single, lock-free instruction. If it’s important to guarantee that all writes complete before reads, you should use acquire-release memory order instead.
  • Use the receive-copy-update pattern, in which they try to update the last value of the shared date they saw, and if another thread changes it in between, they retry. (Very inappropriate here, but works well for many other situations).
  • Have each thread insert its changes into a wait-free list, and the reader thread reconstructs the state from those. (Again, kind of ridiculous here, but worth considering in general.)
  • Wrap the data to be updated with a lock, or even a reader-writer lock. (Unnecessary here, but some resources require it.)
Davislor
  • 14,674
  • 2
  • 34
  • 49
0

What am I missing? The first thing that you are missing is that "yeah yeah I know" is embarrassing when you obviously don't.

Your threads have interleaved execution. Thread one might load the value of sum (say 0), and before storing a result to it, the thread two might iterate the loop say 1000000 times ( so sum is now 1000000). Then thread one stores back its copy of sum (0+1), so sum is now 1. And so on. About the only thing you can predict from your program is than sum will have a value somewhere between -NUM_LOOPS and +NUM_LOOPS.

mevets
  • 10,070
  • 1
  • 21
  • 33