6

Many of the built-in IO functions in C# are non-blocking which is to say they do not hold onto their thread while they wait for their operation to complete.

For example System.IO.File.ReadAllLinesAsync which returns a Task<string[]> is non-blocking.

It doesn't just pause the thread it's using, it actually releases the thread so other processes can use it.

I assume this is accomplished by calling into the OS in such a way that the OS calls back into the program when it has retrieved the file without the program having to waste a thread waiting for it.

Is it possible to create a non-blocking async task yourself?

Doing something like Thread.sleep() clearly doesn't release the current thread like System.IO.File.ReadAllLinesAsync does.

I realize that a sleeping thread doesn't take up CPU resources, but it still takes up a thread which can be a problem in a web server handling numerous requests.

I'm not talking about how to spawn Tasks in general. I'm talking about what the built-in C# functions for handling File/Network calls do to free up their threads while they wait.

markv12
  • 334
  • 1
  • 11
  • 3
    You have to await something. When creating a task that should await some external event, this is often a TaskCompletionSource. – Klaus Gütter Mar 28 '21 at 15:27
  • I did a quick Google search "C# task io" and the second hit was [this nice article by Microsoft](https://learn.microsoft.com/en-us/dotnet/standard/async-in-depth) is it helpful? – JHBonarius Mar 28 '21 at 15:36
  • @JHBonarius Yeah, it looks like this is a pretty deep topic that you usually don't have to deal with because the built-in functions handle it. It's good to know that there are multiple ways to implement a Task, not all of which create a thread. Thanks! – markv12 Mar 28 '21 at 15:53
  • 1
    You may want to take a look at this: [Why File.ReadAllLinesAsync() blocks the UI thread?](https://stackoverflow.com/questions/63217657/why-file-readalllinesasync-blocks-the-ui-thread). Sometimes the reality does not match our expectations. – Theodor Zoulias Mar 28 '21 at 18:15

4 Answers4

2

For IO Bound Tasks

For IO bound tasks, you can simply define a method of type Task<T> and return your value of type T in the method. For example, if you have a method string getHTML(string url) you can call it asynchronously like so:

public async Task<string> getHTMLAsync(string url) {
    return getHTML(url)
}

You can see an example of this in the reference source for the System.IO.File.ReadAllLinesAsync method.

For CPU Bound Tasks

The Task class in the System.Threading.Tasks namespace should provide the functionality you are looking for. You can use it to create a Task object to run whatever process you are trying to achieve. For example, if you have a method int LongRunner that takes a long time to execute and you would like to access it asynchronously, you could define Task<int> LongRunnerAsync:

public Task<int> LongRunnerAsync() {
    return Task.Run( () => LongRunner() );
}

There are several ways of defining your custom Task:

  • Defining a Task using the Task.Run(...) method. This is my default method for defining a Task as it is simple to write and starts the Task immediately. You can do this by calling:
Task.Run( () => {
    doWork();
}
  • Defining a Task to run a predefined action using the constructor. This allows you to define a Task that does not immediately start. This can be done with:
Action action = () => doWork();
Task task = new Task(action);
task.Start();
  • Defining a Task using the Task.Factory.StartNew(...) method. This method allows for more customization than Task.Run(...) but provides similar functionality. I would only recommend using this method if there is a specific reason that you need this over Task.Run(...)

See Microsoft's documentation page.

bisen2
  • 486
  • 1
  • 4
  • 10
  • Nice answer, especially the reference to the docs. Still missing a lot, like async/await, threads, etc. Task.Run is not always the best solution. Plus, you need to wait for completion of tasks (or lose exceptions possibly thrown, etc.) – JHBonarius Mar 28 '21 at 15:02
  • 1
    I am aware of how tasks work in general. I'm talking about the specific case of a task that does not hold onto its thread while it waits for something (File/Network/etc...) All the examples you provided refer to Tasks that run CPU intensive code. I'm talking about a situation where you want to wait for an external process to complete without taking up a thread. Many built-in C# functions do this, but I don't know how. I'm wondering if it's possible to replicate that behavior. – markv12 Mar 28 '21 at 15:03
  • @markv12 Take a peek at my update to this answer and [this section of the Async In Depth guide for more info](https://learn.microsoft.com/en-us/dotnet/standard/async-in-depth#deeper-dive-into-tasks-for-an-io-bound-operation) – bisen2 Mar 28 '21 at 15:56
  • @JHBonarius I totally agree that there is a lot more to it than this answer talks about (or any SO answer could cover). I have updated it with some info on IO-bound tasks where Task.Run would not be the best solution. I felt that async/await and waiting for completion of tasks were a little out of scope for this question, but if you feel the answer would benefit from it feel free to add it in. – bisen2 Mar 28 '21 at 16:02
2

Fundamentally, every single async function that releases a thread ultimately compiles down to a callback, normally executed by the OS.

In modern terminology, this style is often called a Promise, but it has been part of all good operating systems since time immemorial. The general method is to take a callback function and register it, then start some kind of operation. When the operation completes, the callback is called.

This goes all the way down to the processor level, where IO devices signal an interrupt line, which feeds through to the OS kernel, the kernel-mode drivers, user-mode drivers and finally some kind of wait handle that an application thread is waiting on (such as window messages or async IO).


Let's take a deeper look at one of the main examples to see how it's done. We'll go through the main .NET Github repo, as well as the Win32 docs on MSDN. Similar principles apply to most modern OSes. I'm going to assume a fair understanding already of basic IO operations and the basic components of modern PCs.

Bulk IO classes such as FileStream, Socket, PipeStream, SerialPort

These use quite similar methods. Let's look at just FileStream.

Going through the source, it utilizes a class called AsyncWindowsFileStreamStrategy, which in turn utilizes a Win32 API called Overlapped IO. It eventually passes through a callback function to ThreadPoolBoundHandle.AllocateNativeOverlapped, and takes the resulting OVERLAPPED struct to pass to the Win32 APIs such as ReadFileEx.

We don't have the source for Win32, but on a general level, these functions will call through to the Kernel32 and ntdll APIs. These in turn move into kernel-mode, where file-system drivers pass over to disk drivers.

The system that most bulk IO hardware like drives and network adapters use is Direct Memory Access. The driver will just tell the hardware where in RAM to place the data. The hardware loads the data directly to RAM, completely bypassing the CPU.

It then signals an interrupt line to the CPU, which stops what it was doing and transfers control to the kernel's interrupt handler. This then transfers control back up the chain to the drivers, back into user-mode, and eventually the callback in the application is ready to go.

What picks up the callback in the application? The ThreadPool class (the native version, which is here), which uses an IO Completion Port (this is used to merge lots of IO callbacks into a single handle to wait upon). The native-level threads in our application continuously loop on a call to GetQueuedCompletionStatus, which blocks if there is nothing available. As soon as it returns, the relevant callback is fired, which feeds all the way back up to our FileStream and ultimately continues our function where we left off, as will be seen later.

This may or may not be on our original native thread, depending on how we have set up our SynchronizationContext. If we need to marshal a callback to the UI thread, this is done via a window message.


Wait handles such as ManualResetEvent, Semaphore and ReaderWriterLock, as well as classic Window Messaging

These completely block the calling thread, they cannot be used with async/await directly, as they depend fully on the Win32 threading model. But that overall model is somewhat similar to Task: you can wait an event or a number of events, and dispatch your callbacks when needed. There are separate versions of some of these which are compatible with async/await.

A wait event is essentially a call into the kernel, saying "please suspend my thread until such-and-such happens."

What happens to native OS threads when they are suspended?

Native OS threads continuously run on processor cores. The Win32 kernel scheduler sets hardware processor timers to interrupt threads and yield to others that may need to run. At any point, if a native thread is suspended by the Win32 scheduler (either when asked or because of the scheduler yield), it is removed from the runnable-thread queue. As soon as a thread is ready to go again, it is placed in the runnable queue, and will be run when the scheduler gets a chance.

If there are no more threads to run, the processor goes into a low-power HALT, and gets woken up on the next interrupt signal.


Task and async/await

This is a very large topic which I am mostly going to leave to others. But going back to my original premise that releasing a thread triggers an OS level callback: how does Task do this?

First things first, we have already made an error. A thread and a task are different things. A thread can only be suspended by the kernel, a task is just a unit of work that we want done, which we can pick up and drop as needed.

When an await is hit at the very deepest level (the point at which we want to suspend execution), any callback is registered as we mentioned above. When called, the callback function will queue the Task's continuation code to the scheduler for execution. Task utilizes the existing scheduler set up by the CLR to pick up and drop tasks and continuations as needed.

Finally, the TaskScheduler is the class that implements logic as to how to schedule Tasks: should they be executed via the ThreadPool? Should they be marshalled back to the UI thread, or even just executed inline in a loop?

Charlieface
  • 52,284
  • 6
  • 19
  • 43
1

There seems to be a decent amount of discussion in the comments regarding this, but I'm not sure if any of them answered it how you wanted, so I'll try my best.

For now there are usually 2 ways that I can think of an async method being called without a Task. These are usually old API's (such as SqlCommand.BeginExecuteNonQuery) that have already been replaced with Task based calls. If you have a more specific scenaro in mind that would be helpful to provide better examples.

I'm talking about what the built-in C# functions for handling File/Network calls do to free up their threads while they wait.

You ask this, but you had already said 'I assume this is accomplished by calling into the OS in such a way that the OS calls back into the program'. You kind of answered your own question. These built in operations are doing calls that hand off to the OS and get alerted by the OS when they have completed.

In my examples, assume that CallFoo is calling to some kind of OS Operation that handles doing everything. The actual implementation of how the OS is called is not super important for you to worry about, but you can look into calling the windows Kernel from C# if you want to know more.

Async with Callback

Imagine the function looks something like this:

public void CallFoo(Action finishedCallback);

And you want to be able make it so you can call it like this:

public Task CallFoo();

I would define it up something like this:

public Task CallFoo()
{
    var taskCompletionSource = new TaskCompletionSource();

    // Calls to the API that has a non blocking IO call but no async Task API
    CallFoo(() =>
    {
        // Callback is called when the IO task has finished.
        // SetResult will mark the returned Task as complete
        taskCompletionSource.SetResult();
    });

    return taskCompletionSource.Task;
}

Async with handle

The other way I can think of it working is with some kind of 'handle' that gets returned that specifies if the async task has been completed or not.

The method might look something like this:

public IAsyncHandle CallFoo();

In this case, I'd implement it something like this:

public async Task CallFoo()
{
    var handle = CallFoo();

    while (!handle.IsCompleted)
    {
        await Task.Delay(100);
    }
}

This is less ideal because you're just polling to see if it's done, but it does use a lot less resources than doing a thread.sleep. Obvious downside it is doesn't really react in real time to the async action finishing. You can lower/increase your delay depending on your needs.

Lolop
  • 514
  • 2
  • 9
1

Here's the code that is run when you call System.IO.File.ReadAllLinesAsync:

private static async Task<string[]> InternalReadAllLinesAsync(string path, Encoding encoding, CancellationToken cancellationToken)
{
    using StreamReader sr = AsyncStreamReader(path, encoding);
    cancellationToken.ThrowIfCancellationRequested();
    List<string> lines = new List<string>();
    string item;
    while ((item = await sr.ReadLineAsync().ConfigureAwait(continueOnCapturedContext: false)) != null)
    {
        lines.Add(item);
        cancellationToken.ThrowIfCancellationRequested();
    }
    return lines.ToArray();
}

It's just plain-old async stuff. And if you drill in to .ReadLineAsync() it's all just async code. Nothing specifically special.

Enigmativity
  • 113,464
  • 11
  • 89
  • 172