Understanding parallel thread execution

Question

Writing simple C code, trying to control output from two different threads:

#include <pthread.h>
#include <semaphore.h>
#include <stdio.h>

sem_t sem;

void* thread_func(void* aArgs)
{
  printf("Entering thread %p with %d\n", (void*)pthread_self(), (int)aArgs);
  int i = 0;
  for(;i < 10; i++)
  {
    sem_wait(&sem);
    if ((i % 2) == (int)aArgs)
      printf("val is %d in thread %p \n", i, (void*)pthread_self());
    sem_post(&sem);
  }
}

int main()
{
  pthread_t thread_1, thread_2;

  sem_init(&sem, 0, 1);

  pthread_create(&thread_1, NULL, (void*)thread_func, (void*)0);
  pthread_create(&thread_2, NULL, (void*)thread_func, (void*)1);

  pthread_join(thread_1, NULL);
  pthread_join(thread_2, NULL);

  sem_destroy(&sem);

  return 0;
}

What i want to achieve, is sequence of intermixed odd and even numbers. But i receive all the numbers from one thread then all another numbers from the second thread, like this (even if i increase loop counter magnitude):

Entering thread 0xb75f2b40 with 0
val is 0 in thread 0xb75f2b40 
val is 2 in thread 0xb75f2b40 
val is 4 in thread 0xb75f2b40 
val is 6 in thread 0xb75f2b40 
val is 8 in thread 0xb75f2b40 
Entering thread 0xb6df1b40 with 1
val is 1 in thread 0xb6df1b40 
val is 3 in thread 0xb6df1b40 
val is 5 in thread 0xb6df1b40 
val is 7 in thread 0xb6df1b40 
val is 9 in thread 0xb6df1b40

The question is why two independent threads behave like they were two sequential tasks? Why the second thread didn't take the execution control until the first were not finished all the stuff?

I've tried to add pthread_yield() to the end of the for-loop, but situation doesn't change significantly: sometimes i get expected output, sometimes - as described above.

UPD. How can i achieve deterministic one-by-one thread execution? Is there any synchronisation primitive for this?

Try flushing the prints somehow. And give sleeps between the prints. Or use locking. — Milind Dumbare, Feb 22 '15 at 22:21
@Milind: The newline characters will already cause the prints to flush. And how would sleeps help? And note that the OP is already using semaphores to lock. — Oliver Charlesworth, Feb 22 '15 at 22:23
newline is just newline character. How is it going to flush? Sorry missed the semaphores. Sleep is just to take chances on automatic flushing of prints — Milind Dumbare, Feb 22 '15 at 22:26
@Milind: Because in a typical environment, stdout is line-buffered. — Oliver Charlesworth, Feb 22 '15 at 22:28
FWIW, when running your code, I don't get the behaviour you're observing. Are you sure this is the *exact* code that you've compiled and run? — Oliver Charlesworth, Feb 22 '15 at 22:32
@OliverCharlesworth It's possible that your threads are being run in parallel. On my (single-threaded) VM I get the same output as OP. — Daniel Kleinstein, Feb 22 '15 at 22:33
@DanielKleinstein: Yeah, perhaps the loop count isn't high enough for a context-switch to kick in for you. — Oliver Charlesworth, Feb 22 '15 at 22:34
Is there any reason they shouldn't be allowed to run sequentially? Maybe there's so little code in each thread that it takes longer to start the second thread than it does for the first thread to finish. — user253751, Feb 22 '15 at 22:38
If you want your threads interrupt each other, just increase execution time of your thread function, like this: http://pastebin.com/0RMGLzcN . You will see output like this: http://pastebin.com/r77Mx8DM . The thing is, kernel scheduler has a parameter called `quantum`, or `time slice`, which is time period between processes switching (**preemption**). You need to make your function execution time to be greater than `time slice` of scheduler. — Sam Protsenko, Feb 22 '15 at 22:45
Few more links: [1] http://en.wikipedia.org/wiki/Preemption_%28computing%29#Time_slice [2] http://stackoverflow.com/questions/16401294/how-to-know-linux-scheduler-time-slice — Sam Protsenko, Feb 22 '15 at 22:52
@SamProtsenko That's very far from an ideal solution - it's not deterministic, portable, guaranteed to work, or efficient. — Daniel Kleinstein, Feb 22 '15 at 22:54
Yeah, I realize that, just wanted to show what is the actual issue. It didn't supposed to be a solution. — Sam Protsenko, Feb 22 '15 at 23:11
"*The question is why two independent threads behave like they were two sequential tasks? Why the second thread didn't take the execution control until the first were not finished all the stuff?*" Because that was what was most efficient on your platform. Do you have some reason to think interleaving would be better? — David Schwartz, Aug 18 '16 at 18:15

score 4 · Accepted Answer · answered Feb 22 '15 at 23:06

4

If you want to get the desired output, you should use two semaphores instead of one. Each thread should wait on its own semaphore and post the other thread's semaphore after it is done with each loop iteration. The main thread can create one semaphore with a value of 1 and the other with a value of zero to start things off right. This will force the two threads to run in an alternating sequence.

As the program is written currently, doing a sem_post followed by a sem_wait is likely to result in the same thread grabbing the semaphore right away (on a single cpu system). I'm surprised that pthread_yield doesn't help, but using two semaphores will guarantee correct ordering no matter what.

answered Feb 22 '15 at 23:06

JS1

4,745
1
14
19

Not really sure how this solves the problem - won't the thread with the semaphore equal to 1 resume its execution immediately, just like in OP's question? It only waits on its own semaphore. – Daniel Kleinstein Feb 22 '15 at 23:22
@DanielKleinstein Thread1 waits on Thread2's semaphore, does the work, and posts on Thread1's semaphore. Thread2 does it the other way around. You just need to initialize the semaphores so one of the thread starts the first loop iteration. – nos Feb 22 '15 at 23:31
@nos Ah, understood. +1, elegant (though not very extensible to more than 2 threads) – Daniel Kleinstein Feb 22 '15 at 23:40
1

I provided code for this solution in separate answer. – Sam Protsenko Feb 23 '15 at 00:00

score 2 · Answer 2 · edited May 23 '17 at 12:22

2

Just want to demonstrate code for the answer by JS1, for any number of threads:

#include <stdio.h>
#include <pthread.h>
#include <semaphore.h>

#define NUM 3

static sem_t sem[NUM];

static void *thread_func(void *args)
{
    int i;

    for (i = 0; i < 10; ++i) {
        int cur = (long)args;        /* current thread number */
        int next = (cur + 1) % NUM;  /* next thread number*/

        if ((i % NUM) != cur)
            continue;

        sem_wait(&sem[cur]); /* lock this thread's semaphore */
        printf("val is %d, thread num = %ld\n", i, (long)args);
        sem_post(&sem[next]); /* unlock next thread's semaphore */
    }

    return NULL;
}

int main(void)
{
    size_t i;

    pthread_t t[NUM];

    for (i = 0; i < NUM; ++i)
        sem_init(&sem[i], 0, 0); /* locked */

    for (i = 0; i < NUM; ++i)
        pthread_create(&t[i], NULL, thread_func, (void *)i);

    sem_post(&sem[0]);

    for (i = 0; i < NUM; ++i)
        pthread_join(t[i], NULL);

    for (i = 0; i < NUM; ++i)
        sem_destroy(&sem[i]);

    return 0;
}

Output:

val is 0, thread num = 0
val is 1, thread num = 1
val is 2, thread num = 2
val is 3, thread num = 0
val is 4, thread num = 1
val is 5, thread num = 2
val is 6, thread num = 0
val is 7, thread num = 1
val is 8, thread num = 2
val is 9, thread num = 0

edited May 23 '17 at 12:22

Community

1
1

answered Feb 22 '15 at 23:59

Sam Protsenko

14,045
4
59
75

nice solution. And if it'll be three threads, we just should lock executing thread's semaphore and unlock all another thread's semaphores? – ars Feb 23 '15 at 00:02
1

@ars For three threads, each thread should unlock the next thread's semaphore. So thread 0 should unlock thread 1, thread 1 unlocks thread 2, and thread 2 unlocks thread 0. – JS1 Feb 23 '15 at 00:06
@sam Thanks for writing the code to demonstrate what I meant. I'm on a tablet right now so it's hard for me to type out code. – JS1 Feb 23 '15 at 00:08
@ars, I have rewritten the code for any number of threads, just edit `NUM` constant. – Sam Protsenko Feb 23 '15 at 00:13
@JS1, no problem. nice solution, btw! – Sam Protsenko Feb 23 '15 at 00:14

Daniel Kleinstein · Answer 3 · 2015-02-22T23:14:34.923

You keep calling sem_wait and sem_post in the same iteration of your loop, so the thread maintains control over the semaphore for the duration of its time-slice - as soon as sem_post is called, sem_wait is immediately called again (in the same thread) in the following iteration.

Here's a solution to your problem using condition variables:

pthread_mutex_t mut;
pthread_cond_t print_cond;
int print_thread; //equals 0 or 1

These are global variables used to synchronize output between the two threads. print_thread is equal to 0 when we want the first thread to print and is equal to 1 when we want the second thread to print.

And inside thread_func:

for(;i < 10; i++)
{
    pthread_mutex_lock(&mut);
    if ((i % 2) == (int)aArgs){
        while (print_thread != (int)aArgs){
            pthread_cond_wait(&print_cond, &mut);
        }
        printf("val is %d in thread %p \n", i, (void*)pthread_self());
        print_thread = 1 - (int)aArgs;
        pthread_cond_signal(&print_cond);
        pthread_mutex_unlock(&mut);
    } else {
        pthread_mutex_unlock(&mut);
    }
}

With this code, you should get output similar to the following:

Entering thread 0xb6fbcb70 with 1
Entering thread 0xb77bdb70 with 0
val is 0 in thread 0xb77bdb70 
val is 1 in thread 0xb6fbcb70 
val is 2 in thread 0xb77bdb70 
val is 3 in thread 0xb6fbcb70 
val is 4 in thread 0xb77bdb70 
val is 5 in thread 0xb6fbcb70 
val is 6 in thread 0xb77bdb70 
val is 7 in thread 0xb6fbcb70 
val is 8 in thread 0xb77bdb70 
val is 9 in thread 0xb6fbcb70

Note that this solution extends well to printing with more than just two threads: the only changes necessary are updating print_thread appropriately.

I'm not sure this is correct. As soon as one thread posts, the other waiting thread should immediately take over. — Oliver Charlesworth, Feb 22 '15 at 22:29
@Daniel Kleinstein: nothing changed, when i move calls to sem_wait() and sem_post() under the if() statement. — ars, Feb 22 '15 at 22:30
@OliverCharlesworth You're not guaranteed that - pending on a sempahore isn't a fifo queue on most systems, and unblocking a task that pends on a semaphore doesn't necessarily force a context switch to that process - so the process that just unblocked the mutex could very well grab it right away again. Though on a multiprocessor system you'd have a higher chance the other thread gets woken up and grabs the mutex. — nos, Feb 22 '15 at 23:15

score 0 · Answer 4 · edited Feb 22 '15 at 23:41

pthread_yield() is a non-standard call, use sched_yield() from sched.h instead.

Also I would init the semaphore at 0 and call sem_post after the creation of the two threads.

So the thread's code looks like

for(;i < 10; i++)
{
    sem_wait(&sem);
    if ((i % 2) == (int)aArgs)
        printf("val is %d in thread %p \n", i, (void*)pthread_self());
    sem_post(&sem);
    sched_yield();
}

And in the main:

sem_init(&sem, 0, 0);

pthread_create(&thread_1, NULL, (void*)thread_func, (void*)0);
pthread_create(&thread_2, NULL, (void*)thread_func, (void*)1);

sem_post(&sem);

What is get is:

Entering thread 0x7f74c7697700 with 0
val is 0 in thread 0x7f74c7697700 
Entering thread 0x7f74c6e96700 with 1
val is 2 in thread 0x7f74c7697700 
val is 4 in thread 0x7f74c7697700 
val is 1 in thread 0x7f74c6e96700 
val is 3 in thread 0x7f74c6e96700 
val is 5 in thread 0x7f74c6e96700 
val is 7 in thread 0x7f74c6e96700 
val is 6 in thread 0x7f74c7697700 
val is 9 in thread 0x7f74c6e96700 
val is 8 in thread 0x7f74c7697700

Doesn't work for me. It looks like scheduler just returns to first thread each time after rescheduling, so it highly depends on scheduler strategy and you CPU frequency. So this answer is not deterministic. — Sam Protsenko, Feb 22 '15 at 23:15
OP asked how he could get each thread to print one line at a time. — Daniel Kleinstein, Feb 22 '15 at 23:16

Understanding parallel thread execution

4 Answers4