From what I can make out, the place where the sched_child_runs_first
feature is implemented is in the task_fork_fair
function, the source for which you can see here.
The key part of that function looks like this:
if (curr)
se->vruntime = curr->vruntime;
place_entity(cfs_rq, se, 1);
if (sysctl_sched_child_runs_first && curr && entity_before(curr, se)) {
swap(curr->vruntime, se->vruntime);
resched_task(rq->curr);
}
se is the new scheduling entity and curr is the scheduling entity for the current task.
Note that the vruntime for the new entity is first initialised with the same value as the current task. This is significant, because the entity_before
call is checking whether the vruntime of curr is less than the vruntime of se.
So the only way that condition will succeed is if the place_entity
call sets the vruntime of se to something larger. So let's look at the source for that. The key bits are:
u64 vruntime = cfs_rq->min_vruntime;
if (initial && sched_feat(START_DEBIT))
vruntime += sched_vslice(cfs_rq, se);
se->vruntime = max_vruntime(se->vruntime, vruntime);
So assuming the START_DEBIT
feature is set (which seems to be the case), then the vruntime will be set to the run queue's min_vruntime plus whatever the sched_vslice
call returns. If this is greater than the current vruntime then we're set - if not we'll be left with our initial vruntime value and the condition won't succeed.
I don't understand Linux scheduling well enough to say for sure, but I'm guessing that min_vruntime plus sched_vslice
just isn't large enough most of the time.
I say most of the time because, when I was testing, I was able to get the child process to run first at least some of the time. So it's possible the sched_child_runs_first
parameter does make a difference - it's just not a guarantee of anything.
The other possibility is that it's a bug in the code, and they should have started with the current task's vruntime rather than the run queue's min_vruntime when calculating the initial value in the place_entity
function. That would have guaranteed the condition would succeed. But I suspect there's a reason for doing things the way they do which I just don't understand.