1

I have this very simple multi-threaded program where I am reading a bunch of numbers and eventually adding them to a global sum. I am using the local_sum variable in each thread which I then add to the global sum. The code is given at the end.

The question is - Do I need to lock this line of code in the thread_runner function?

sum += local_sum;

I am pretty sure that I do need to lock this however the program always gives the right result anyway. If I do need locks, how do I prove that this implementation is wrong?

Note: I am only interested in mutex locks as the synchronization primitive here. Also please ignore that I have not checked return values.

Code:

#include <assert.h>
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

void *thread_runner(void *index);
// pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;

long array[100000000];
long sum = 0;

int main() {
    /* Insert values in the array */
    long i = 0;
    for (i = 0; i < 100000000; i++) {
        array[i] = i+1;
    }

    /* Declare thread variables */
    pthread_t t1, t2, t3, t4;
    int index1 = 0;
    int index2 = 25000000;
    int index3 = 50000000;
    int index4 = 75000000;

    /* Create 4 threads */
    int rc = pthread_create(&t1, NULL, thread_runner, &index1);
    rc = pthread_create(&t2, NULL, thread_runner, &index2);
    rc = pthread_create(&t3, NULL, thread_runner, &index3);
    rc = pthread_create(&t4, NULL, thread_runner, &index4);

    /* Wait for threads to complete */
    rc = pthread_join(t1, NULL);
    rc = pthread_join(t2, NULL);
    rc = pthread_join(t3, NULL);
    rc = pthread_join(t4, NULL);

    printf("Sum: %ld\n", sum);

    return 0;
}


void *thread_runner(void *index) {
    int i = 0;
    int i_index = *((int*)index);
    long local_sum = 0;
    for (i = i_index; i < i_index+25000000; i++) {
        local_sum += array[i];
    }
    // Do i need to lock the following statement ?
    sum += local_sum;
}
Taimoor Zaeem
  • 190
  • 1
  • 12
  • 5
    *how do I prove that this implementation is wrong?* That's backwards. I hope someone's not asking you to prove it's wrong - that person is the one who's wrong. You need to prove it's correct in all cases - and you can't do that because it's a race condition. "Works for me" is a horribly low standard and one only used by incompetent programmers. – Andrew Henle Dec 16 '22 at 12:22
  • 3
    Does this answer your question? [Are +=, |=, &= etc atomic?](https://stackoverflow.com/questions/2468857/are-etc-atomic) – GSerg Dec 16 '22 at 12:25
  • This is not thread safe. If you get the correct result, it's because you're lucky, not because the code is correct. What I would do is create an array of local_sum in the calling function and add them up after all the joins. You'd need to make structs to pass index and local_sum pointers. – Simon Goater Dec 16 '22 at 12:32
  • 1
    If you're looking for a way to get the wrong answer, I suggest you reduce the work by a factor of a million say, then loop around a million or so times, checking each time if you got the correct answer. – Simon Goater Dec 16 '22 at 12:40
  • Bolding "how do I prove that this implementation is wrong?" doesn't change anything. It **IS** wrong. Full stop. `sum += local_sum;` **IS** a [data race condition](https://en.wikipedia.org/wiki/Race_condition). Race conditions [invoke undefined behavior](https://port70.net/~nsz/c/c11/n1570.html#J.2p1): "The execution of a program contains a data race" – Andrew Henle Dec 16 '22 at 12:44
  • @AndrewHenle I agree. I was just looking for a way to show it through output. – Taimoor Zaeem Dec 16 '22 at 13:01
  • 1
    @TaimoorZaeem if you want to provoke errors use the global sum always instead of using a local sum and add it later. Its like probability. If you only use the conflictive variable once it is more unlikely to have some error. However, if u constantly access the global variable it should give some errors. – LexFerrinson Dec 16 '22 at 13:43
  • @AndrewHenle I think that instead of dogmatizing a response you could aid this young programer to discover why it is wrong. That is how knowledge is built and that is what stack overflow is about. Backward (as you mention) is how knowledge is built. – LexFerrinson Dec 16 '22 at 13:51

1 Answers1

2

how do I prove that this implementation is wrong?

You point to the line sum += local_sum, and you say, "See! The threads access the same shared variable without any locking and without any other form of synchronization," and then you point to the section in the language spec or you point to some other suitable authority that says "...undefined behavior."

The reason why data races are bad is that they are unpredictable. You cannot depend on a program containing a data race to give the right answer, and you cannot depend on it to give the wrong answer either. You cannot depend on testing to find data races.

Solomon Slow
  • 25,130
  • 5
  • 37
  • 57