Linux kernel: schedule() function

Question

I have seen several other questions on the forum that talk about this schedule() function, but my question is a bit different. I have seen several discussions and literature about it's theoretical, algorithmic and implementation aspects.

The thing which is not clear and the question is asked about, is the execution aspect. Of course looking at the source of the kernel in depth and doing all the required debugging, tracing bla bla... might answer this question but it seem not wise to reinvent the wheel.

The questions/confusions are as following:

What is the path traversed by a multi-threaded user program at kernel level?

Who schedules the threads? Which interrupt context(s)? Any name? (if we see at a trace at kernel level, there is nothing called "sched", but there are swappers, inits, ksoft* bla bla) Deos it go like this:

A process (user program) its child threads all are taken by the kernel at first, then kernel makes them as executable threads (by amalgamating them with schedule() and/or other functions, i.e., this new executable thread has some instructions from the kernel maybe from schedule()/others, embedded along with user task instructions. That makes it scheduled automatically if the situation occurs )

OR

schedule() is always executing at some co-processor to observe and act, if necessary, from that co-processor? That's why, sometimes when we see any two threads switching on a cpu there is only swapper executing in-between and before and after, i.e., there is nothing called scheduler at that level, right?

Thanks for reading and sorry for writing my confusions to share with.

"if we see at a trace at kernel level, there is nothing called "sched", but there are swappers, inits, ksoft"... Those sound like processes or kernel threads. schedule() is a kernel function. — Jonathon Reinhart, Dec 19 '13 at 10:40
schedule() looks at the queue of threads that are waiting to run, and swaps it onto the processor. It's typically called from some sort of interrupt context. — Jonathon Reinhart, Dec 19 '13 at 10:41
Thanks a lot @JonathonReinhart: Logical, Which interrupt contexts may call it? Do these context appear in the trace .. I mean, are they already there, any names? Again thanks for answer and understanding the query.. — tod, Dec 19 '13 at 10:48
Possible duplicate of http://stackoverflow.com/q/20629964/1401351 — Peter, Dec 19 '13 at 14:31
A thread is no different than a separate task with the exception that the `struct mm` is shared; Ie, it has the same view of memory. Typically, the OS arch has a `struct thread_info` on the kernel stack which has pointers to other owned objects. `schedule()` swaps the active kernel stack and updates the VM if it is needed. — artless noise, Dec 20 '13 at 16:03
@artlessnoise Again my question is: if I want to see where a statement, let say: `printf("Hello World!");` from a user program took the cpu time, the answer is: It's probably somewhere when the (task) user binary was occupying the cpu. How about schedule()? — tod, Dec 20 '13 at 17:49
I can not think schedule() is configuring a part of the hardware/control unit then everything happens automatically afterwards. Of course schedule() itself is executed (takes cpu time) to give meaning to its respective instructions (i would say passive source code) and make the desired scheduling happen, right? — tod, Dec 20 '13 at 17:58
@Peter Please read the answer and comments to do a fair deal in the link you have given! btw, that is also my question and over here I have tried again to make my query better phrased with the best of my ability. If it goes against any rule, should i delete the previous one? — tod, Dec 20 '13 at 18:16
Now, you have changed your question? Who gets the run time accounting for `schedule()`? This is different depending on how it is sampled. If you depend on the sub-micron time it takes `schedule()` to execute, then you have other problems. The accounting will be added to who ever the active [`task_struct`](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/include/linux/sched.h#n1042) is. Look at [switch_to.h](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/include/asm-generic/switch_to.h). This *arch* magic switches the stack atomic for accounting. — artless noise, Dec 20 '13 at 19:19
@artlessnoise by saying, "The accounting will be added to who ever the active `task_struct` " do you mean, "A process (user program) its child threads all are taken by the kernel at first, then kernel makes them as executable threads (by amalgamating them with schedule()" ? NO, the question is same. To me, of course the one who gets the run time accounting is the one executing it, right? — tod, Dec 20 '13 at 19:29
See [core.c](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/kernel/sched/core.c#n2122). The `switch_to` changes the kernel stack. There is always a macro call `current()` which gets the current task. This will get run time incremented when an interrupt samples to see who is running. On one side of `switch_to` it is the old task and on the other it is the new task. See [current.h](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/include/asm-generic/current.h). `get_thread_info()` has light weight `thread_info`, having a `task_struct` pointer. — artless noise, Dec 20 '13 at 19:49
The task accounting can not be better than the granularity of the system tick unless the timer supplies a mechanism for getting a better resolution; this will depend on **CONFIG** variables. — artless noise, Dec 20 '13 at 19:53
@artlessnoise will you say Yes or No for this? `by saying, "The accounting will be added to who ever the active task_struct " do you mean, "A process (user program) its child threads all are taken by the kernel at first, then kernel makes them as executable threads (by amalgamating them with schedule()" ?` — tod, Dec 20 '13 at 19:53
Thanks @artlessnoise but I think we are not on the same page — tod, Dec 20 '13 at 20:03
Maybe [Linux threads and processes](http://stackoverflow.com/questions/9305992/linux-threads-and-process) can help you? I think you don't understand the data structures? You need to understand them to understand the code. — artless noise, Dec 21 '13 at 00:33

score 8 · Answer 1 · answered Dec 19 '13 at 11:05

X OR Y - neither.

These preemptive, multithreading OS kernels are all much the same overall.

Look at it, (very simply), like this:

The OS kernel scheduler/dispatcher is a big, complex, interrupt-handler. Interrupts, in OS terms come in two flavours:

Hardware interrupts from peripherals like disk, network, keyboard, mouse. These interrupts cause drivers to run, and the drivers may request a scheduling run from the kernel when they exit.

Software interrupts from threads - system calls that can change the state of a thread, eg. a thread may request input that is not immediately available, and so that thread will not run on until the input is available.

When an interrupt does occur, the kernel uses its internal state data, together with the request data from the interrupt, to run its scheduling algorithm and decide which threads should be running on the available cores. If it decides that the set of running threads needs to change, it can stop any thread running on any core by using an inter-core driver to cause a hardware-interrupt of the core running the thread.

If there are no interrupts, the kernel does nothing. It can't do anything because it is not being entered from anywhere. It does not need to execute on any co-processor. It does not need to 'inject' any calls into user code.

It's a state-machine with interrupts as inputs and a set of running threads as outputs.

@Martin James, Thanks Sir, really useful. "It's a state-machine": Does this state-machine configure some control RAM inside the Architecture (May be like a look-up-table) to give this functionality, i.e., at this specific input (interrupt), output this control signal (a signal to select a set of running threads)? Sorry, I am really confused about these things.. — tod, Dec 20 '13 at 11:33
@Martin James: `"decide which threads should be running"` dission making needs CPU, right? ... if I want to see it at micro level, should it appear in the form of some task/process/thread? what is the name of that instance? — tod, Dec 22 '13 at 09:11

Claudio · Answer 2 · 2013-12-19T11:05:33.653

4

Inside the Linux kernel, threads are just processes that share some resources. In other words, for the Linux kernel, threads are just a particular case of processes. The data structure is the same (i.e., task_struct - see include/linux/sched.h).

The system timer is the hardware timer issuing an interrupt at a programmable frequency. Periodically, it issues an hardware interrupt. When the kernel receives such an interrupt, the execution of the current process/thread is interrupted and the system runs the Interrupt Service Routine (ISR) in kernel mode. This routine calls the scheduling functions which choose which process (or thread) should be excecuted in the next time slot. Then, the routine performs a context switch by preempting the currently executing process/thread. This is the basis of multitasking which is the same across all modern general-purpose operating systems.

edited Dec 19 '13 at 11:05

answered Dec 19 '13 at 10:58

Claudio

10,614
4
31
71

You seem to be fixated on the timer driver and have ignored all the other interrupts. This is very misleading, but I'm having a 'no-downvoting' day:) – Martin James Dec 19 '13 at 11:08
@MartinJames It is just as wrong as your answer; it is complex and depends on kernel configuration options. For instance, *state-machine with interrupts as inputs* is contradicted by information in your own answer; a syscall may cause a context switch. For people in IO wait, the other interrupts are important. However, for scheduler work load balance, the timer interrupt is a good thing to fixate upon; besides a task becoming active/sleeping, it is what will cause round robin, etc. – artless noise Dec 20 '13 at 19:26

Linux kernel: schedule() function

2 Answers2