Pthread synchronization using mutex not syncing words properly

Question

The goal of this program is to copy a string taken in by user input word for word using multithreading. Each thread copies every fourth word so for instance the first thread copies the first and fifth words, the second copies the second and sixth words, etc. I have done quite a bit of research on mutex and I believe I have implemented the mutex lock properly however the string still comes up as jumbled nonsense when it prints. Can someone shed some light as to why the threads aren't synchronizing?

#include <stdio.h>
#include <pthread.h>
#include <string.h>
#include <stdlib.h>

void *processString(void *);

char msg1[100];
char msg2[100];
char * reg;
char * token;
char * tokens[10];
pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t = PTHREAD_COND_INITIALIZER;

int main(){
    int i = 0, j;
    pthread_t workers[4];

    printf("Please input a string of words separated by whitespace characters: \n");
    scanf("%99[^\n]", msg1); //take in a full string including whitespace characters

    //tokenize string into individual words
    token = strtok(msg1, " ");  
    while(token != NULL){
        tokens[i] = (char *) malloc (sizeof(token));
        tokens[i] = token;
        token = strtok(NULL, " ");
        i++;
    }

    for(j = 0; j < 4; j++){
        if(pthread_create(&workers[j], NULL, processString, (void *) j))
            printf("Error creating pthreads");
    }

    for(i = 0; i < 4; i++){
        pthread_join(workers[i], NULL);
    }
    pthread_mutex_destroy(&lock);

    printf("%s\n", msg2);

    return 0;
}

//each thread copies every fourth word
void *processString(void *ptr){
    int j = (int) ptr, i = 0;

    pthread_mutex_lock(&lock);

    while(tokens[i * 4 + j] != NULL){   
        reg = (char *) malloc (sizeof(tokens[i * 4 + j]));
        reg = tokens[i * 4 + j];
        strcat(msg2, reg);
        strcat(msg2, " ");
        i++;
    }

    pthread_mutex_unlock(&lock);
    return NULL;
}

If you want to control the order in which the threads run, you'll need something like a condition-variable or semaphores. Mutexes only ensure mutual exclusion, not order. — EOF, Mar 21 '16 at 18:32
What EOF said. But also, your code leaks memory. At each iteration of its loop, function `processString()` allocates a block of memory, assigns a pointer to that block to variable `reg`, and then immediately overwrites that pointer with a different one, thus losing the only pointer to the allocated block. That's just as well in one sense, however, because you're allocating space sufficient for one *pointer*, whereas it appears that the (misguided) intent was to allocate space sufficient for the pointed-to string. — John Bollinger, Mar 21 '16 at 18:47
Additionally, your locking is wrong. Once each thread acquires the mutex, it will perform *all* of its work before releasing it. The mutex is effective in preventing the resulting string from being corrupted, and in that sense you are using it correctly. But only if there are fewer than five words in the input is there any chance of getting the desired output. — John Bollinger, Mar 21 '16 at 18:52
@JohnBollinger Apart from the leaking memory (which I have fixed thanks for the tip) that was my problem. Do I need to use mutex conditions to fix this? — Turbotec, Mar 22 '16 at 17:06

score 1 · Accepted Answer · edited May 23 '17 at 12:07

As @EOF wrote in comments, mutexes provide only mutual exclusion. They prevent multiple cooperating threads from running concurrently, but they do not inherently provide any control over the order in which they are acquired by such threads. Additionally, as I described in comments myself, mutexes do provide mutual exclusion: if one thread holds a mutex then no other thread will be able to acquire that mutex, nor proceed past an attempt to do so, until that mutex is released.

There is no native synchronization object that provides directly for making threads take turns. That's not usually what you want threads to do, after all. You can arrange for it with semaphores, but that gets messy quickly as you add more threads. A pretty clean solution involves using a shared global variable to indicate which thread's turn it is to run. Access to that variable must be protected by a mutex, since all threads involved must both read and write it, but there's a problem with that: what if the thread that currently holds the mutex is not the one whose turn it is to run?

It is possible for all the threads to loop, continuously acquiring the mutex, testing the variable, and either proceeding or releasing the mutex. Unfortunately, such a busy wait tends to perform very poorly, and in general, you can't be confident that the thread that can make progress at any given point in the execution will manage to acquire the mutex in bounded time.

This is where condition variables come in. A condition variable is a synchronization object that permits any number of threads to suspend activity until some condition is satisfied, as judged by another, non-suspended thread. Using such a tool avoids the performance-draining busy-wait, and in your case it can help ensure that all your threads get their chance to run in bounded time. The general-purpose per-thread usage model for condition variables is as follows:

acquire the mutex protecting the shared variable(s) by which to judge whether I can proceed
Test whether I can proceed. If so, jump to step 5.
I can't proceed right now. Perform a wait on the condition variable.
I have awakened from the wait; go back to step 2.
Do the work I need to do.
Broadcast a signal to wake all threads waiting on the condition variable.
Release the mutex.

Variations on that are possible, but I recommend that you do not vary from it until and unless you know exactly why you want to do so, and exactly why the variation you have in mind is safe. Note, too, that when a thread performs a wait on a condition variable associated with a given mutex, it automatically releases that mutex while it waits, and re-acquires it before returning from the wait. This allows other threads to proceed in the meantime, and, in particular, to wait on the same condition variable.

As it applies to your problem, the shared state you want your threads to test is the aforementioned variable that indicates which thread's turn it is, and the the condition you want your threads to wait on is that it has become a different thread's turn (but this is implicit in the way you use the condition variable; condition variables themselves are generic). Note also that this means that part of the work each thread must do before signaling the other threads is to update which thread's turn it is. And since each thread may need to take multiple turns, you will want to wrap the whole procedure in a loop.

Thanks! That was exactly what I needed. My program is doing exactly what I needed it to do. :) — Turbotec, Mar 23 '16 at 00:12

Pthread synchronization using mutex not syncing words properly

1 Answers1