Fundamentally, every single async function that releases a thread ultimately compiles down to a callback, normally executed by the OS.
In modern terminology, this style is often called a Promise, but it has been part of all good operating systems since time immemorial. The general method is to take a callback function and register it, then start some kind of operation. When the operation completes, the callback is called.
This goes all the way down to the processor level, where IO devices signal an interrupt line, which feeds through to the OS kernel, the kernel-mode drivers, user-mode drivers and finally some kind of wait handle that an application thread is waiting on (such as window messages or async IO).
Let's take a deeper look at one of the main examples to see how it's done. We'll go through the main .NET Github repo, as well as the Win32 docs on MSDN. Similar principles apply to most modern OSes. I'm going to assume a fair understanding already of basic IO operations and the basic components of modern PCs.
Bulk IO classes such as FileStream
, Socket
, PipeStream
, SerialPort
These use quite similar methods. Let's look at just FileStream
.
Going through the source, it utilizes a class called AsyncWindowsFileStreamStrategy, which in turn utilizes a Win32 API called Overlapped IO. It eventually passes through a callback function to ThreadPoolBoundHandle.AllocateNativeOverlapped
, and takes the resulting OVERLAPPED
struct to pass to the Win32 APIs such as ReadFileEx
.
We don't have the source for Win32, but on a general level, these functions will call through to the Kernel32
and ntdll
APIs. These in turn move into kernel-mode, where file-system drivers pass over to disk drivers.
The system that most bulk IO hardware like drives and network adapters use is Direct Memory Access. The driver will just tell the hardware where in RAM to place the data. The hardware loads the data directly to RAM, completely bypassing the CPU.
It then signals an interrupt line to the CPU, which stops what it was doing and transfers control to the kernel's interrupt handler. This then transfers control back up the chain to the drivers, back into user-mode, and eventually the callback in the application is ready to go.
What picks up the callback in the application? The ThreadPool
class (the native version, which is here), which uses an IO Completion Port (this is used to merge lots of IO callbacks into a single handle to wait upon). The native-level threads in our application continuously loop on a call to GetQueuedCompletionStatus
, which blocks if there is nothing available. As soon as it returns, the relevant callback is fired, which feeds all the way back up to our FileStream
and ultimately continues our function where we left off, as will be seen later.
This may or may not be on our original native thread, depending on how we have set up our SynchronizationContext
. If we need to marshal a callback to the UI thread, this is done via a window message.
Wait handles such as ManualResetEvent
, Semaphore
and ReaderWriterLock
, as well as classic Window Messaging
These completely block the calling thread, they cannot be used with async/await
directly, as they depend fully on the Win32 threading model. But that overall model is somewhat similar to Task
: you can wait an event or a number of events, and dispatch your callbacks when needed. There are separate versions of some of these which are compatible with async/await
.
A wait event is essentially a call into the kernel, saying "please suspend my thread until such-and-such happens."
What happens to native OS threads when they are suspended?
Native OS threads continuously run on processor cores. The Win32 kernel scheduler sets hardware processor timers to interrupt threads and yield to others that may need to run. At any point, if a native thread is suspended by the Win32 scheduler (either when asked or because of the scheduler yield), it is removed from the runnable-thread queue. As soon as a thread is ready to go again, it is placed in the runnable queue, and will be run when the scheduler gets a chance.
If there are no more threads to run, the processor goes into a low-power HALT
, and gets woken up on the next interrupt signal.
Task
and async/await
This is a very large topic which I am mostly going to leave to others. But going back to my original premise that releasing a thread triggers an OS level callback: how does Task
do this?
First things first, we have already made an error. A thread and a task are different things. A thread can only be suspended by the kernel, a task is just a unit of work that we want done, which we can pick up and drop as needed.
When an await
is hit at the very deepest level (the point at which we want to suspend execution), any callback is registered as we mentioned above. When called, the callback function will queue the Task
's continuation code to the scheduler for execution. Task
utilizes the existing scheduler set up by the CLR to pick up and drop tasks and continuations as needed.
Finally, the TaskScheduler
is the class that implements logic as to how to schedule Task
s: should they be executed via the ThreadPool
? Should they be marshalled back to the UI thread, or even just executed inline in a loop?