87

I have been trying to understand fork() behavior. This time in a for-loop. Observe the following code:

#include <stdio.h>

void main()
{
   int i;

   for (i=0;i<3;i++)
   {
      fork();

      // This printf statement is for debugging purposes
      // getppid(): gets the parent process-id
      // getpid(): get child process-id

      printf("[%d] [%d] i=%d\n", getppid(), getpid(), i);
   }

   printf("[%d] [%d] hi\n", getppid(), getpid());
}

Here is the output:

[6909][6936] i=0
[6909][6936] i=1
[6936][6938] i=1
[6909][6936] i=2
[6909][6936] hi
[6936][6938] i=2
[6936][6938] hi
[6938][6940] i=2
[6938][6940] hi
[1][6937] i=0
[1][6939] i=2
[1][6939] hi
[1][6937] i=1
[6937][6941] i=1
[1][6937] i=2
[1][6937] hi
[6937][6941] i=2
[6937][6941] hi
[6937][6942] i=2
[6937][6942] hi
[1][6943] i=2
[1][6943] hi

I am a very visual person, and so the only way for me to truly understand things is by diagramming. My instructor said there would be 8 hi statements. I wrote and ran the code, and indeed there were 8 hi statements. But I really didn’t understand it. So I drew the following diagram:

enter image description here

Diagram updated to reflect comments :)

Observations:

  1. Parent process (main) must iterate the loop 3 times. Then printf is called
  2. On each iteration of parent for-loop a fork() is called
  3. After each fork() call, i is incremented, and so every child starts a for-loop from i before it is incremented
  4. At the end of each for-loop, "hi" is printed

Here are my questions:

  • Is my diagram correct?
  • Why are there two instances of i=0 in the output?
  • What value of i is carried over to each child after the fork()? If the same value of i is carried over, then when does the "forking" stop?
  • Is it always the case that 2^n - 1 would be a way to count the number of children that are forked? So, here n=3, which means 2^3 - 1 = 8 - 1 = 7 children, which is correct?
lucidgold
  • 4,432
  • 5
  • 31
  • 51
  • Why not run it and print out `i`, the PID and parent PID after the `fork()`. It should be easy to track what's happening – Basic Nov 07 '14 at 03:14
  • 3
    @Basic, that’s the first thing I did. I even used getpid() and getppid() and that's how/why I believe my diagram is correct. But I really want someone to verify this. – lucidgold Nov 07 '14 at 03:16
  • That's a really nice diagram. Did you make it with dot/graphviz? – Steven Lu Nov 10 '14 at 17:08
  • 2
    I used Microsoft Visio. But I now use LibreOffice Draw, very similar to Open Office Draw and both are open source project I believe, thus free! – lucidgold Nov 10 '14 at 17:15
  • 1
    How did you disable the buffering by default ? On the gfg ide, the outputs are different without fflush - http://ide.geeksforgeeks.org/0TWiEZ, and with fflush - http://ide.geeksforgeeks.org/0JKaH5 – Udayraj Deshmukh Sep 18 '17 at 05:21

3 Answers3

51

Here's how to understand it, starting at the for loop.

  1. Loop starts in parent, i == 0

  2. Parent fork()s, creating child 1.

  3. You now have two processes. Both print i=0.

  4. Loop restarts in both processes, now i == 1.

  5. Parent and child 1 fork(), creating children 2 and 3.

  6. You now have four processes. All four print i=1.

  7. Loop restarts in all four processes, now i == 2.

  8. Parent and children 1 through 3 all fork(), creating children 4 through 7.

  9. You now have eight processes. All eight print i=2.

  10. Loop restarts in all eight processes, now i == 3.

  11. Loop terminates in all eight processes, as i < 3 is no longer true.

  12. All eight processes print hi.

  13. All eight processes terminate.

So you get 0 printed two times, 1 printed four times, 2 printed 8 times, and hi printed 8 times.

Crowman
  • 25,242
  • 5
  • 48
  • 56
  • 2
    Awesome, so my diagram is correct. However, this explains why there are two instances of i=0 (because I always print current value of i). Even-though i=0 is carried over, the next fork only executes when i is incremented! Thanks! – lucidgold Nov 07 '14 at 04:04
  • 4
    Yes, `fork()` just duplicates the process and they both keep going in the same way. *Both of them have just returned from a `fork()` call*, and will not make another call until they next time they come across a call to `fork()`. The only difference is that `fork()` returned 0 in the child, and something else in the parent, but as far as each process is concerned, they both just returned from a `fork()` and then keep on going. – Crowman Nov 07 '14 at 04:08
  • @PaulGriffiths According to your explanation it is going to be a balanced tree. isn't it but the order of execution would vary since order is not deterministic but in essence it's balanced. Isn't it? – Naseer Oct 13 '15 at 20:25
  • @khan: My explanation doesn't speak to when any of these things will happen, just how many times they will happen. You are correct that the order of execution will generally not be predictable. – Crowman Oct 13 '15 at 20:38
  • @PaulGriffiths for the sake of argument.let us assume that these processes are represented by nodes of tree.Then my question from you is that we know that order would not be predictable but we are sure that if fork does not fail those node would be activated sooner or later thereby printing the value.So balanced tree should be there.Shouldn't that? – Naseer Oct 13 '15 at 20:43
  • @khan: No, because the root node would have three branches. The branch resulting from its first `fork()` would have a depth of 3, and the branch resulting from its last `fork()` would have a depth of 1, so it wouldn't be balanced. – Crowman Oct 13 '15 at 20:57
  • @PaulGriffiths this is the key idea that I am not getting could you elaborate it a bit more or tell me a link where I could see this.I am not getting this point. – Naseer Oct 13 '15 at 21:00
  • 1
    @khan: Not sure how to elaborate more than the answer. Just draw it out. Start with one node, and then three times, add a child to every node you can see. You'll end up with eight nodes: one with three children, one with two, two with one, and four with none. – Crowman Oct 13 '15 at 21:09
12
  1. Yes, it's correct. (see below)
  2. No, i++ is executed after the call of fork, because that's the way the for loop works.
  3. If all goes successfully, yes. However, remember that fork may fail.

A little explanation on the second one:

for (i = 0;i < 3; i++)
{
   fork();
}

is similar to:

i = 0;
while (i < 3)
{
    fork();
    i++;
}

So i in the forked processes(both parent and child) is the value before increment. However, the increment is executed immediately after fork(), so in my opinion, the diagram could be treat as correct.

Yu Hao
  • 119,891
  • 44
  • 235
  • 294
  • So, since i is incremented after the call of fork(), then i in a child will be the last value of i from parent? So my diagram is NOT correct? – lucidgold Nov 07 '14 at 03:22
  • @lucidgold That's why I told you to put in a printf. – 2501 Nov 07 '14 at 03:22
  • @2501: I did, but its not very clear. In the print statements there are two instance of i=0, but if I sketch a diagram when i=0 then I do not get 8 hi statements. – lucidgold Nov 07 '14 at 03:24
  • @2501: I know but it doesn’t make sense visually to me. So if i=0 gets carried over to every child, then when does the "forking" stop? – lucidgold Nov 07 '14 at 03:32
  • 1
    @lucidgold I think you diagram is correct, forking stops because i is immediately after the call incremented. Yu Hao explained it well. – 2501 Nov 07 '14 at 03:35
  • @2501: His answer definitely cleared up a few things. My issue now is why are there two instances of i=0? – lucidgold Nov 07 '14 at 03:42
  • 1
    @lucidgold: Because you `printf()` *after* the `fork()`. `i` will be `0` in both the parent and the first child, and they'll both execute that `printf()` call. What you're missing in your diagram is that the first red child should have an `i == 0`, but that box won't `fork()`. Same with the second red child, it should have an `i == 1` box for the `printf()`, but that box won't `fork()`, either. – Crowman Nov 07 '14 at 03:43
  • @PaulGriffiths: I am following, but why would both instances NOT fork()? – lucidgold Nov 07 '14 at 03:50
  • 1
    @lucidgold: Because by the time the first child is created, the first `fork()` has already happened. The first child will `fork()` for the first time on the next iteration of the loop, when `i == 1`. But it will complete the first iteration of the loop before it does that (because it was created in the middle of that iteration), so it'll `printf()` the `i=0`. – Crowman Nov 07 '14 at 03:51
5

To answer your questions one by one:

Is my diagram correct?

Yes, essentially. It's a very nice diagram, too.

That is to say, it's correct if you interpret the i=0 etc. labels as referring to full loop iterations. What the diagram doesn't show, however, is that, after each fork(), the part of the current loop iteration after the fork() call is also executed by the forked child process.

Why are there two instances of i=0 in the output?

Because you have the printf() after the fork(), so it's executed by both the parent process and the just forked child process. If you move the printf() before the fork(), it will only be executed by the parent (since the child process doesn't exist yet).

What value of i is carried over to each child after the fork()? If the same value of i is carried over, then when does the "forking" stop?

The value of i is not changed by fork(), so the child process sees the same value as its parent.

The thing to remember about fork() is that it's called once, but it returns twice — once in the parent process, and once in the newly cloned child process.

For a simpler example, consider the following code:

printf("This will be printed once.\n");
fork();
printf("This will be printed twice.\n");
fork();
printf("This will be printed four times.\n");
fork();
printf("This will be printed eight times.\n");

The child process created by fork() is an (almost) exact clone of its parent, and so, from its own viewpoint, it "remembers" being its parent, inheriting all of the parent process's state (including all variable values, the call stack and the instruction being executed). The only immediate difference (other than system metadata such as the process ID returned by getpid()) is the return value of fork(), which will be zero in the child process but non-zero (actually, the ID of the child process) in the parent.

Is it always the case that 2^n - 1 would be a way to count the number of children that are forked? So, here n=3, which means 2^3 - 1 = 8 - 1 = 7 children, which is correct?

Every process that executes a fork() turns into two processes (except under unusual error conditions, where fork() might fail). If the parent and child keep executing the same code (i.e. they don't check the return value of fork(), or their own process ID, and branch to different code paths based on it), then each subsequent fork will double the number of processes. So, yes, after three forks, you will end up with 2³ = 8 processes in total.

Ilmari Karonen
  • 49,047
  • 9
  • 93
  • 153