Why doesn't my mutex work properly in a multi-process C application?

Question

I am hacking away at a uni assignment and have come across a problem with my code which is supposed to spawn 2 processes, where the second process waits for the 1st to complete before executing. This is what I have so far:

sem_t mutex;
int producer; int consumer;
sem_init(&mutex, 0, 1);
producer = fork();
consumer = fork();

if (producer == 0) {
    if (VERBOSE) printf("Running producer\n");
    /* down semaphore */
    sem_wait(&mutex);
    /* START CRITICAL REGION */
    get_files(N);
    /* END CRITICAL REGION */
    /* up semaphore */
    sem_post(&mutex);
    if (VERBOSE) printf("Ending producer\n");
    exit(0);
}

if (consumer == 0) {
    if (VERBOSE) printf("Running consumer\n");
    /* down semaphore */
    sem_wait(&mutex);
    /* START CRITICAL REGION */
    /* do stuff */
    /* END CRITICAL REGION */
    /* up semaphore */
    sem_post(&mutex);
    if (VERBOSE) printf("Ending consumer\n");
    exit(0);
}
/* parent waits for both to complete */
wait(NULL);

Now, I know that in the "real-world" this is really stupid. If my 'consumer' does nothing while until my 'producer' is finished, then you might as well not have 2 processes like this, but the assignment is trying to illustrate a race-condition, so that why we've been specifically told to do it this way.

So, my problem is that the consumer process isn't waiting for the producer. I assumed that since the semaphore was taken down in the producer (sem_wait(&mutex);) then it wouldn't be available to the consumer until sem_post(&mutex); is called in the producer.

Additionally, as best as I can tell, the line wait(NULL); isn't waiting for both processes to complete.

Have I critically misunderstood something?

fork shares memory... it should see and be able to use the mutexes just the same no? (i may be well off here, i've not tried it) — tobyodavies, Nov 27 '10 at 13:22
@tobyodavies: I said it because of the phrase "supposed to spawn 2 threads" — tur1ng, Nov 27 '10 at 13:25
@tur1ng, fair enough i suppose thats why it was a comment, not an answer ;) makes sense... — tobyodavies, Nov 27 '10 at 13:28
@tobyodavies: No, `fork()` created processes do not share memory (unless the memory has specifically been mapped as `MAP_SHARED`). — caf, Nov 27 '10 at 13:29
My bad, question modified, we're supposed to be forking PROCESSES, not threads as originally indicated, sorry for this mistake. — Ash, Nov 27 '10 at 13:31
@caf, just re-read the man page and you are right! it uses Copy-on-write (which is probably what confused me)... — tobyodavies, Nov 27 '10 at 13:32
@tobyodavies: are you sure **fork** shares memory? I think it DUPLICATES memory from parent to child... — Pablo Santa Cruz, Nov 27 '10 at 13:33
In a way, forked processes do share memory. But as soon as one of the processes write to it, it will receive its own copy (the whole page will be copied, I believe). This is to speed up the forking process and copy memory only when necessary. — gablin, Nov 27 '10 at 13:38

caf · Answer 1 · 2010-11-27T14:06:55.683

11

You should have error-checking on your semaphore calls. Use perror() to display the error if sem_wait(), sem_init() or sem_post() returns non-zero.

Secondly, you a creating more processes than you think. Your first fork() results in a parent (with producer non-zero) and a child (with producer zero). Both processes then execute the second fork(), so you now have four processes.

Thirdly, the sem_t variable must be shared between the processes, so it must be stored in a shared memory region. The easiest way to achieve this is with mmap():

sem_t *sem = mmap(NULL, sizeof *sem, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0);

(Execute that prior to the sem_init() and the first fork()).

Forthly, it's not defined which process will run first, so you can't rely on the producer thread calling sem_wait() before the consumer does. Instead, initialise the semaphore to zero with sem_init(), and only call sem_wait() in the consumer - this will block the consumer. The producer executes, and calls sem_post() when it is done, which allows the consumer to proceed.

The sem_init() call should specify pshared as non-zero and value as 0, so it should look like:

if (sem_init(sem, 1, 0) != 0) {
    perror("sem_init");
    exit(1);
}

Fifth, wait(NULL) only waits for a single child process to exit. Call it twice to wait for two child processes.

edited Nov 27 '10 at 14:06

answered Nov 27 '10 at 13:42

caf

233,326
40
323
462

You've actually spotted bug #2, so thanks for that. The question as been modified though to update you on the other elements of your answer. The short of it though, while your correct - in this situation, i dont think thats the problem. – Ash Nov 27 '10 at 13:54
according to O'Reily (http://linuxdevcenter.com/pub/a/linux/2007/05/24/semaphores-in-linux.html?page=5) you only need shared mem if the processes are not parent/child – tobyodavies Nov 27 '10 at 13:59
`sem_wait` will just block and never succeed on a semaphore with an initial value of 0 no? I may have misread the man page, but my understanding was that a semaphore's value represents the number of additional 'users' it can handle so that a semaphore w/ value=1 is equivalent to a mutex... – tobyodavies Nov 27 '10 at 14:06
@tobyodavies: `sem_wait()` will block on a semaphore with a value of 0, until the semaphore is posted. This is exactly the behaviour that is desired here - an initial value of `0`, a `sem_wait()` at the beginning of the consumer and a `sem_post()` at the end of the producer. In this case the value of the semaphore is "number of things that have been produced and not yet consumed". – caf Nov 27 '10 at 14:09
true, i was still trying to fix the first bug, you're way ahead of me – tobyodavies Nov 27 '10 at 14:12
Ok, awesome. Your right, except that the semaphore DOESN'T need to be in the shared memory segment. It's working just fine without it. Thanks so much for your help though, it was great, and you actually saw and solved another bug I was having (producer running twice) so thanks a tonne. The problem was OSX, which apparently doesn't implement `sem_init`. I am using `mutex = sem_open("mutex", O_CREAT, S_IRUSR | S_IWUSR, 1);` and it is working just as expected (but i had to change the waits/posts `sem_wait(mutex);` but im going to give the answer to @tobyodavies (he needs it more than you) :P – Ash Nov 27 '10 at 14:39
@Ash: The semaphore doesn't need to be in shared memory if you're using a named semaphore obtained with `sem_open()` - only unnamed "memory-based" sempahores obtained with `sem_init()` do. – caf Nov 27 '10 at 23:59

tobyodavies · Accepted Answer · 2010-11-27T13:56:07.350

4

Just because you fork the producer thread first doesn't mean the OS will schedule it to run first - its quite possible that the consumer actually runs and gets the lock first.

Also, you should check the return value of sem_wait - it is posible to return from it without holding the semaphore.

It is also quite possible (as several people have noted in comments) that semaphores may simply not work across forked processes

EDIT - if you pass a non-zero value to argument 2 of sem_init(sem_t *sem, int pshared, unsigned value) when initializing posix semaphores will work across processes

EDIT - See here for a much better explanation than i could give, complete with source code to do nearly exactly what you want

edited Nov 27 '10 at 13:56

answered Nov 27 '10 at 13:20

tobyodavies

27,347
5
42
57

Ok, I looked into your suggestion, and it seems as though NEITHER semaphore locks are being granted. Both `sem_wait(&mutex);` lines return a value of -1 which Google tells me means a failure. What should I do now? (P.S. I guess thats why my `wait(NULL);` isn't working either then, huh? – Ash Nov 27 '10 at 13:39
Yes, the point being that you need to actually call sem_init, and check the status of sem_wait. – David Gelhar Nov 27 '10 at 13:41
you should do `while(sem_wait(mutex)!=0);` as a simplest suggestion (note the ';' - loop w/o doing anything) – tobyodavies Nov 27 '10 at 13:41
I've modified the question to include the `sem_init(&mutex, 0, 1);` line which I have in my program. Does it look correct? – Ash Nov 27 '10 at 13:43
no, you need a non-zero second argument to allow semaphore state to be shared between children - i.e. `sem_init(&mutex,TRUE,0)` – tobyodavies Nov 27 '10 at 13:45
Changed the waits to `while(sem_wait(mutex)!=0);` and the app waited on them forever. No luck. – Ash Nov 27 '10 at 13:49
changed the `sem_init(&mutex, 0, 1);` to: `sem_init(&mutex, 1, 1);` - and it didn't make any difference. – Ash Nov 27 '10 at 13:51
is the `sem_init` returning sucessfully? – tobyodavies Nov 27 '10 at 14:02
Checking errno *apparently* the error coming back from the `sem_wait(&mutex)` line is "Bad file descriptor" which seems like a weird error for it to throw. – Ash Nov 27 '10 at 14:07
ah ha! sem_init WAS throwing an error: "Function not implemented" according to errno! But I'm not entirely sure what to do with this knowledge. – Ash Nov 27 '10 at 14:08
see http://stackoverflow.com/questions/2752085/c-semaphores-sem-wait-throwing-inexplicable-error for that error – tobyodavies Nov 27 '10 at 14:10
Sigh - i am using OSX too! Will try using `sem_open` and see what happens. – Ash Nov 27 '10 at 14:13
Does sem_open use the same parameters? Replaced sem_init with sem_open and its now failing with "No such file or directory" Or perhaps is the return not 0 as it is with sem_init? – Ash Nov 27 '10 at 14:19
sem_open behaves _very_ differently to sem_init... see man pages but you should be able to work it out http://opengroup.org/onlinepubs/007908799/xsh/sem_open.html – tobyodavies Nov 27 '10 at 14:27
yup I got it thanks dude, you've been a bigger help than you probably realise. Without your help I possible would have failed, but now its working like I want. – Ash Nov 27 '10 at 14:40

atzz · Answer 3 · 2010-11-27T13:48:56.830

Have you provided complete code in the question?

If so, you are missing semaphore initialization. You have to call either sem_init or sem_open prior to using the semaphore.

Read here.

EDIT You are specifying pshared = 0 in the sem_init call. This makes the semaphore process-local (i.e. it can be used only to synchronize threads of one process). fork creates a child process, so the semaphore does not do anything useful.

If pshared has value 0, then the semaphore is shared between the threads of a process. If pshared is non-zero, then the semaphore is shared between processes.

(the quote is from the link above)

score 0 · Answer 4 · answered Nov 27 '10 at 13:45

First of all, I don't think mutexes will work with processes. The reason being that forked processes don't actually share memory. They will be able to read the same memory as it was before the fork, but whenever one of them writes to memory the new process will receive its own copy.

Second, you might have to initialize your mutexes, or else usage of them might be undefined.

Why doesn't my mutex work properly in a multi-process C application?

4 Answers4