5

I learned that setting a non-zero value in /proc/sys/kernel/sched_child_runs_first will force the child process to run before the parent. However, I don't think it seems to be working. Here is my code:

#include <stdio.h>
#include <sys/types.h>

int main(int argc, char **argv)
{
  pid_t child_pid;

  switch(child_pid = fork())
    {
    case 0:
      printf("In Child\n");
      exit(0);

    case -1:
      printf("Could not fork()\n");

    default:
      printf("In parent\n");

    }
  return 0;
}

The output I get is always:

In parent
In Child

Am I expecting something wrong here?

PS: I am just experimenting to see if it works, so please refrain from suggesting other synchronization mechanisms or why this is a bad idea, etc.

Johannes Schaub - litb
  • 496,577
  • 130
  • 894
  • 1,212
  • 1
    Did you go through this: http://yarchive.net/comp/linux/child-runs-first.html – Aman Deep Gautam Jun 30 '13 at 14:16
  • @AmanDeepGautam Thanks for the link. yes I read most of the things there in Michael Kerrisk's book as well. But he suggested that setting the value in the above file "gurantees" that the child will run first, Or may be I read wrong. –  Jun 30 '13 at 14:23
  • but the link says that they are removing the support for this from kernel version 2.6 and I could not find a newer post where they said to have added the support again. So may be by this way it is not possible, but I am not a authority on this matter..:) – Aman Deep Gautam Jun 30 '13 at 14:28
  • @AmanDeepGautam perhaps, i will need to dig deeper unless someone can confirm. Thanks anyway. –  Jun 30 '13 at 14:30
  • Using the following commands on a Raspberry Pi Zero w/ Raspbian: $ sudo echo "1" > /proc/sys/kernel/sched_child_runs_first $ ./fork_whos_on_first 10000 > fork.txt $ ./fork_whos_on_first.count.awk fork.txt I get the following numbers (vs. 100%, 100% for "0"): child 21 0.21% parent 9979 99.79% – tamberg Oct 20 '18 at 11:11
  • Here's an email from Linus Torvalds https://yarchive.net/comp/linux/child-runs-first.html explaining some trade-offs and stating that there are no guarantees. – tamberg Oct 20 '18 at 11:12

1 Answers1

8

From what I can make out, the place where the sched_child_runs_first feature is implemented is in the task_fork_fair function, the source for which you can see here.

The key part of that function looks like this:

if (curr)
        se->vruntime = curr->vruntime;
place_entity(cfs_rq, se, 1);

if (sysctl_sched_child_runs_first && curr && entity_before(curr, se)) {
        swap(curr->vruntime, se->vruntime);
        resched_task(rq->curr);
}

se is the new scheduling entity and curr is the scheduling entity for the current task.

Note that the vruntime for the new entity is first initialised with the same value as the current task. This is significant, because the entity_before call is checking whether the vruntime of curr is less than the vruntime of se.

So the only way that condition will succeed is if the place_entity call sets the vruntime of se to something larger. So let's look at the source for that. The key bits are:

u64 vruntime = cfs_rq->min_vruntime;

if (initial && sched_feat(START_DEBIT))
        vruntime += sched_vslice(cfs_rq, se);

se->vruntime = max_vruntime(se->vruntime, vruntime);

So assuming the START_DEBIT feature is set (which seems to be the case), then the vruntime will be set to the run queue's min_vruntime plus whatever the sched_vslice call returns. If this is greater than the current vruntime then we're set - if not we'll be left with our initial vruntime value and the condition won't succeed.

I don't understand Linux scheduling well enough to say for sure, but I'm guessing that min_vruntime plus sched_vslice just isn't large enough most of the time.

I say most of the time because, when I was testing, I was able to get the child process to run first at least some of the time. So it's possible the sched_child_runs_first parameter does make a difference - it's just not a guarantee of anything.

The other possibility is that it's a bug in the code, and they should have started with the current task's vruntime rather than the run queue's min_vruntime when calculating the initial value in the place_entity function. That would have guaranteed the condition would succeed. But I suspect there's a reason for doing things the way they do which I just don't understand.

James Holderness
  • 22,721
  • 2
  • 40
  • 52