How index is affected when multiple threads are running in a loop

Question

I was trying to write a program which runs 5 threads and prints its index accordingly.

Below is the code:

#include <stdio.h>
#include <pthread.h>
#include <stdlib.h>

int nthreads=5;

void *busy(void* c) {

    int my_busy = *(int *) c;

    printf("Hello World with thread index %d\n", my_busy);

    return NULL;
}    

int main()
{
    pthread_t t1[nthreads];
    void* result;

    for(int i=0; i<nthreads; i++)
    {
        pthread_create(&t1[i], NULL, busy, &i);
    }

    for(int i=0; i<nthreads; i++)
    {
        pthread_join(t1[i], &result);
    }
    return 0;
}

Output obtained:

Hello World with thread index 1
Hello World with thread index 4
Hello World with thread index 2
Hello World with thread index 0
Hello World with thread index 0

Though all 5 threads run why the corresponding indexes are not properly outputted? Why do I tend to loose some indexes and get others twice? For example in this case I have lost 3 and got 0 outputted twice. Though using pthread_join along with pthread_create in one loop solves the problem it doesn't schedule all threads to run in parallel. What should be done in this case to get all indexes printed?

Possible duplicate of [pthread\_create and passing an integer as the last argument](https://stackoverflow.com/q/19232957/608639). The Q&A provides (1) passing the counter as integral using the `void*` parameter; and (2) passing a pointer to the counter using a heap allocation. Also see [How to pass a sequential counter by reference to pthread start routine?](https://stackoverflow.com/q/31060070/608639), which attempts to do what you are doing. — jww, Sep 23 '18 at 05:58
Possible duplicate of [How to pass a sequential counter by reference to pthread start routine?](https://stackoverflow.com/questions/31060070/how-to-pass-a-sequential-counter-by-reference-to-pthread-start-routine) — Makyen, Sep 23 '18 at 22:10

Nominal Animal · Accepted Answer · 2018-09-24T16:44:58.773

Though all 5 threads run why the corresponding indexes are not properly outputted?

You pass a pointer to a variable to each thread, and modify that variable at the same time the thread functions access it. Why would you expect the thread functions to see any particular value? They run concurrently. It is possible for the threads to see a completely impossible, garbled value, if one thread is reading the value while another is modifying it, on some architectures.

For example in this case I have lost 3 and got 0 outputted twice.

Although the machine code generated by e.g. GCC increments the variable accessed by the thread functions after creating each thread function, the value observed by the thread functions may be "old" on some architectures, because no barriers or synchronization is used.

Whether this occurs on your particular machine (without explicit barriers or synchronization), depends on which memory ordering model your machine implements.

For example, on x86-64 (aka AMD64; the 64-bit Intel/AMD architecture), all reads and writes are observed in order, except that stores may be ordered after loads. This means that if initially say i = 0;, and thread A does i = 1;, thread B can still see i == 0 even after thread A modified the variable.

Note that adding the barriers (e.g. _mm_fence() using x86/AMD64 intrinsics provided by <immintrin.h> when using most C compilers) is not enough to ensure each thread sees an unique value, because the start of each thread can be delayed with respect to the real world moment when pthread_create() was called. All they ensure is that at most one thread sees the zero value. Two threads may see value 1, three value 2, and so on; it is even possible for all threads to see value 5.

What should be done in this case to get all indexes printed?

The simplest option is to provide the index to be printed as a value, rather than as a pointer to a variable. In busy(), use

my_busy = (int)(intptr_t)c;

and in main(),

pthread_create(&t1[i], NULL, busy, (void *)(intptr_t)i);

The intptr_t type is a signed integer type capable of holding a pointer, and is defined in <stdint.h> (usually included by including <inttypes.h> instead).

(Since the question is tagged linux, I probably should point out that in Linux, on all architectures, you can use long instead of intptr_t, and unsigned long instead of uintptr_t. There are no trap representations in either long or unsigned long, and every possible long/unsigned long value can be converted to an unique void *, and vice versa; a round-trip is guaranteed to work correctly. The kernel syscall interface requires that, so it is extremely unlikely to change in the future either.)

If you need to pass the pointer to i, but wish each thread to see an unique value, you need to use some sort of synchronization.

The simplest synchronized approach would be to use a semaphore. You could make it global, but using a structure to describe the work parameters, and passing the pointer of the structure (even if same one is used for all worker threads) is more robust:

#include <stdlib.h>
#include <pthread.h>
#include <semaphore.h>
#include <string.h>
#include <stdio.h>

#define  NTHREADS  5

struct work {
    int     i;
    sem_t   s;
};

void *worker(void *data)
{
    struct work *const  w = data;
    int                 i;

    /* Obtain a copy of the value. */
    i = w->i;

    /* Let others know we have copied the value. */
    sem_post(&w->s);

    /* Do the work. */
    printf("i == %d\n", i);
    fflush(stdout);

    return NULL;
}    

int main()
{
    pthread_t    thread[NTHREADS];
    struct work  w;
    int          rc, i;

    /* Initialize the semaphore. */
    sem_init(&w.s, 0, 0);

    /* Create the threads. */
    for (i = 0; i < NTHREADS; i++) {

        /* Create the thread. */
        w.i = i;
        rc = pthread_create(&thread[i], NULL, worker, &w);
        if (rc) {
            fprintf(stderr, "Failed to create thread %d: %s.\n", i, strerror(rc));
            exit(EXIT_FAILURE);
        }

        /* Wait for the thread function to grab its copy. */
        sem_wait(&w.s);
    }

    /* Reap the threads. */
    for (i = 0; i < NTHREADS; i++) {
        pthread_join(thread[i], NULL);
    }

    /* Done. */
    return EXIT_SUCCESS;
}

Because the main thread, the thread that modifies the value seen by each worker thread, participates in the synchronization, so that each worker function reads the value before the next thread is created, the output will always be in increasing order of i.

A much better approach is to create a work pool, where the main thread defines the work to be done collectively by the threads, and the thread functions simply obtain the next chunk of work to be done, in whatever order:

#define  _POSIX_C_SOURCE  200809L
#include <stdlib.h>
#include <string.h>
#include <pthread.h>
#include <limits.h>
#include <stdio.h>
#include <errno.h>

#define  NTHREADS  5
#define  LOOPS     3

struct work {
    pthread_mutex_t  lock;
    int              i;
};

void *worker(void *data)
{
    struct work *const  w = data;
    int                 n, i;

    for (n = 0; n < LOOPS; n++) {

        /* Grab next piece of work. */
        pthread_mutex_lock(&w->lock);
        i = w->i;
        w->i++;
        pthread_mutex_unlock(&w->lock);

        /* Display the work */
        printf("i == %d, n == %d\n", i, n);
        fflush(stdout);
    }

    return NULL;
}

int main(void)
{
    pthread_attr_t  attrs;
    pthread_t       thread[NTHREADS];
    struct work     w;
    int             i, rc;

    /* Create the work set. */
    pthread_mutex_init(&w.lock, NULL);
    w.i = 0;

    /* Thread workers don't need a lot of stack. */
    pthread_attr_init(&attrs);
    pthread_attr_setstacksize(&attrs, 2 * PTHREAD_STACK_MIN);

    /* Create the threads. */
    for (i = 0; i < NTHREADS; i++) {
        rc = pthread_create(thread + i, &attrs, worker, &w);
        if (rc != 0) {
            fprintf(stderr, "Error creating thread %d of %d: %s.\n", i + 1, NTHREADS, strerror(rc));
            exit(EXIT_FAILURE);
        }
    }

    /* The thread attribute set is no longer needed. */
    pthread_attr_destroy(&attrs);

    /* Reap the threads. */
    for (i = 0; i < NTHREADS; i++) {
        pthread_join(thread[i], NULL);
    }

    /* All done. */
    return EXIT_SUCCESS;
}

If you compile and run this last example, you'll notice that the output may be in odd order, but each i is unique, and each n = 0 through n = LOOPS-1 occurs exactly NTHREADS times.

It looks like someone really does not want to see any answers to the stated question, looking at the downvoting spree... — Nominal Animal, Sep 24 '18 at 16:25
I don't really see any reason for the downvote. If I had to hazard a guess, it's probably because of the conversion of int to pointer and then back to int (but that's not a reason I personally agree with or think it justifies a downvte). — P.P, Sep 24 '18 at 17:40
@P.P.: I think the other answer(s) and the question itself were downvoted at approximately the same time. I really should find a better way to reword this answer so that it emphasizes that the last one is what I recommend, not just for such simple cases, but for actual real-world thread workers in real-world applications. (Any suggestions are warmly welcome!) I just believe that the two prior examples are "necessary", as they are what many tutorials suggest, and what one often sees in real-world code. — Nominal Animal, Sep 24 '18 at 18:17

score 0 · Answer 2 · answered Sep 23 '18 at 08:57

0

Because the index value already changed when "busy" function started. It is better to have separate copy of parameters you are passing to thread proc.

answered Sep 23 '18 at 08:57

Vladimir Pustovalov

335
3
7

I don't believe this is correct. It does not explain two indexes of 0. (It would explain two indexes of 1). – jww Sep 23 '18 at 16:47
Your last two indexes are 0, because you have another cycle which started from 0 and in this point you wait until thread is stopped. – Vladimir Pustovalov Oct 10 '18 at 15:27

How index is affected when multiple threads are running in a loop

2 Answers2

Linked