29

I'm trying to understand how and when to use async programming and got to I/O bound operations, but I don't understand them. I want to implement them from scratch. How can I do that?

Consider the example below which is synchronous:

private void DownloadBigImage() {
    var url = "https://cosmos-magazine.imgix.net/file/spina/photo/14402/180322-Steve-Full.jpg";
    new WebClient().DownloadFile(url, "image.jpg");
}

How do I implement the async version by only having the normal synchronous method DownloadBigImage without using Task.Run since that will use a thread from the thread pool only for waiting - that's just being wasteful!

Also do not use the special method that's already async! This is the purpose of this question: how do I make it myself without relying on methods which are already async? So, NO things like:

await new WebClient().DownloadFileTaskAsync(url, "image.jpg");

Examples and documentation available are very lacking in this regard. I found only this: https://learn.microsoft.com/en-us/dotnet/standard/async-in-depth which says:

The call to GetStringAsync() calls through lower-level .NET libraries (perhaps calling other async methods) until it reaches a P/Invoke interop call into a native networking library. The native library may subsequently call into a System API call (such as write() to a socket on Linux). A task object will be created at the native/managed boundary, possibly using TaskCompletionSource. The task object will be passed up through the layers, possibly operated on or directly returned, eventually returned to the initial caller.

Basically I have to use a "P/Invoke interop call into a native networking library"... but how?

Igor Popov
  • 9,795
  • 7
  • 55
  • 68
  • 1
    "also do not use the special method that's already async", what are you talking about here? Is this a question **you** have, or is this a task you've been given? It seems you're trying to do homework due to the way you've worded your question. Can you please clarify exactly what you want to do here? – Lasse V. Karlsen Jun 22 '18 at 10:11
  • 1
    It isn't possible to magically turn a sync method into async without rewriting it or wrapping it up in something like a thread, that's why this isn't documented, the concept doesn't exist. Instead you would write the code that talks to whatever async to begin with because most likely the thing you talk to already follows async principles and concepts. Things like sockets. – Lasse V. Karlsen Jun 22 '18 at 10:12
  • The real question here is which problem you're trying to solve. As you already know, there **already** exists async classes in the .NET framework that does what you want, why do you feel the need to reinvent the wheel? – Lasse V. Karlsen Jun 22 '18 at 10:14
  • 1
    And just to be clear, "reinvent downloading a file over http from scratch using nothing but native async socket operations" is **far** too broad for a Stack Overflow question. There are so many things you need to know about in order to do that so no matter how detailed an answer you could get to something here there will still be tons more details you need before you're done. Please narrow your question down to something manageable. – Lasse V. Karlsen Jun 22 '18 at 10:16
  • I can't understand why do you refuse to use `Task.Run()`? I can see that you wrote that it _"will use a thread from the thread pool"_ but why do you think it will wasteful? This thread will download a large file from net, isn't it? – vasily.sib Jun 22 '18 at 10:26
  • I was reading the docs and this is the part that's missing everywhere... I just wanted to understand how it works (not a homework). So basically your answer is that this is already available from the .NET framework, right? But what if you're writing a library like JSON serialization or an ORM framework? How do you proceed in that case? – Igor Popov Jun 22 '18 at 10:26
  • 2
    @LasseVågsætherKarlsen to me it seems more like he is trying to understand how async works under the hood. No sane person would try to rewrite async functionality *in* C#... My guess is you can't since this is really low level stuff. – Freggar Jun 22 '18 at 10:26
  • @IgorPopov the core async functionality makes pretty much only sense when you interact with your NetworkCard or your HardDrive (or maybe some other things that don't come to mind right now). The low level libraries will cover that for you (in the article you linked, there is also an explanation how the IRQs are processed). Everything else pretty much just *wraps* this core functionality. your JSON serialization would probably boil down to a HardDrive write. Your ORM write will propably boil down to a Network write and so on. – Freggar Jun 22 '18 at 10:33
  • 5
    Not sure why this question is marked as too broad. I find it very specific and reasonable. You just have to take the time to understand his goal. – usr Jun 22 '18 at 10:55
  • I think you should use ThreadPool.UnsafeQueueNativeOverlapped, https://learn.microsoft.com/en-us/dotnet/api/system.threading.threadpool.unsafequeuenativeoverlapped – yangtam Sep 23 '20 at 07:37

4 Answers4

15

This is a great question which really isn't explained well in most texts about C# and async.

I searched for this for ages thinking I could and should maybe be implementing my own async I/O methods. If a method/library I was using didn't have async methods I thought I should somehow wrap these functions in code that made them asynchronous. It turns out that this isn't really feasible for most programmers. Yes, you can spawn a new thread using Thread.Start(() => {...}) and that does make your code asynchronous, but it also creates a new thread which is an expensive overhead for asynchronous operations. It can certainly free up your UI thread to ensure your app stays responsive, but it doesn't create a truly async operation the way that HttpClient.GetAsync() is a truly asynchronous operation.

This is because async methods in the .net libraries use something called "standard P/Invoke asynchronous I/O system in .NET" to call low level OS code that doesn't require a dedicated CPU thread while doing outbound IO (networking or storage). It actually doesn't dedicate a thread to its work and signals the .net runtime when it's done doing its stuff.

I'm not familiar with the details but this knowledge is enough to free me from trying to implement async I/O and make me focus on using the async methods already present in the .net libraries (such as HttpClient.GetAsync()). More interesting info can be found here (Microsoft async deep dive) and a nice description by Stephen Cleary here

Torfi
  • 911
  • 1
  • 9
  • 13
  • 2
    What you wrote should be the intro for async/await books/tutorials. Since the documentation is so lacking I also misunderstood things and did it the wrong way. Thank you! – Igor Popov May 02 '20 at 08:58
12

I think this is a very interesting question and a fun learning exercise.

Fundamentally, you cannot use any existing API that is synchronous. Once it's synchronous there is no way to turn it truly asynchronous. You correctly identified that Task.Run and it's equivalents are not a solution.

If you refuse to call any async .NET API then you need to use PInvoke to call native APIs. This means that you need to call the WinHTTP API or use sockets directly. This is possible but I don't have the experience to guide you.

Rather, you can use async managed sockets to implement an async HTTP download.

Start with the synchronous code (this is a raw sketch):

using (var s = new Socket(...))
{
 s.Connect(...);
 s.Send(GetHttpRequestBytes());
 var response = new StreamReader(new NetworkStream(s)).ReadToEnd();
}

This very roughly gets you an HTTP response as a string.

You can easily make this truly async by using await.

using (var s = new Socket(...))
{
 await s.ConnectAsync(...);
 await s.SendAsync(GetHttpRequestBytes());
 var response = await new StreamReader(new NetworkStream(s)).ReadToEndAsync();
}

If you consider await cheating with respect to your exercise goals you would need to write this using callbacks. This is awful so I'm just going to write the connect part:

var s = new Socket(...)
s.BeginConnect(..., ar => {
   //perform next steps here
}, null);

Again, this code is very raw but it shows the principle. Instead of waiting for an IO to complete (which happens implicitly inside of Connect) you register a callback that is called when the IO is done. That way your main thread continues to run. This turns your code into spaghetti.

You need to write safe disposal with callbacks. This is a problem because exception handling cannot span callbacks. Also, you likely need to write a read loop if you don't want to rely on the framework to do that. Async loops can be mind bending.

usr
  • 168,620
  • 35
  • 240
  • 369
  • _"You correctly identified that Task.Run and it's equivalents are not a solution."_ why? Can you please point me to the docs, where this is explained? – vasily.sib Jun 22 '18 at 10:43
  • @usr thank you for your explanation... it makes sense now... the basic idea is that I should use the async methods from the language and should very rarely (if at all) get to write P/Invoke or callbacks as you described. Please correct me if I'm wrong... – Igor Popov Jun 22 '18 at 10:47
  • 1
    @vasily.sib the point of async IO is to not block any thread while the IO is running (think of a 30s webservice call, don't want to block a thread that long). `Task.Run` is a convenient way to provision a thread. If you call a sync API on it it will be blocked. – usr Jun 22 '18 at 10:48
  • 1
    @IgorPopov In a production application you would use the most high level API that gets the job done. Here, since you are just trying to learn, you must chose to what lower level you want to go. I would suggest that you do not use PInvoke, since it's a lot of pointless work. You can see how high level async APIs are constructed from low level ones by just using low level managed APIs. I hope I got what your comment is about. – usr Jun 22 '18 at 10:49
  • @vasily.sib I'd say `Task.Run` is a common pattern when you have to call an `async` method inside of a **sync** method. What OP is asking here is: let's assume you have to create a method which will call `DownloadBigImage()` method (which is sync). How would you make your own method `async` when there is no `DownloadBigImageAsync()` alternative? Using `Task.Run` for this purpose wouldn't help at all. – Alisson Reinaldo Silva Jun 22 '18 at 11:08
  • @Alisson `public async Task DownloadBigImageAsync(string src) => await Task.Run(() => DownloadBigImage(src));`? – vasily.sib Jun 23 '18 at 02:42
  • 1
    @usr, thanks for explanation, but please, explain one more thing (two things actually). 1st: Why do you think, that blocking a thread (non UI-thread) is a bad idea? I always think, that the point of async is to "block worker thread instead of UI-thread". 2nd: in case of callbacks, how do you await result from callback? – vasily.sib Jun 23 '18 at 02:49
  • 1
    @vasily.sib async IO blocks no thread at all. An IO is just a data structure in the OS kernel. Sync IO is an async IO plus a wait in the kernel. Blocking threads has the downside of holding up 1MB of thread stack (on .NET) and OS resources for that time. In most apps this is not at all an issue and async IO is far overused on .NET these days. But when you get into the 100s of threads that tends to be a perf and reliability issue so async IO becomes attractive. / 2: You do not wait for the callback. The code must be structured so that no wait is needed. The "next" code goes inside the callback. – usr Jun 23 '18 at 09:17
  • @usr, thanks a lot! now I get a point. Sure, 100s of threads is not a good solution at all, but a single blocking thread to download something from network is good enough for me (despite of 1MB memory overhead). – vasily.sib Jun 23 '18 at 16:17
  • @vasily.sib yeah, it's mostly a non-issue. Do not use async IO if not needed. Causes you additional issues with zero gain. It's a tremendous fad these days. – usr Jun 23 '18 at 20:10
2

TLDR: Generally you can by using TaskCompletionSource.

If you only have blocking calls available then you cannot do this. But usually there are "old" asynchronous methods that do not use async nor Task, but rely instead on callbacks. In that case you can use a TaskCompletionSource to both create a Task that can be returned, and use it to set the Task to completed when the callback returns.

Example using the old .Net Framework 3.0 methods in WebClient (but programmed in later .Net that has Task):

    public Task DownloadCallbackToAsync(string url, string filename)
    {
        using (var client = new WebClient())
        {
            TaskCompletionSource taskCreator = new TaskCompletionSource();

            client.DownloadFileCompleted += (sender, args) => taskCreator.SetResult();

            client.DownloadFileAsync(url, filename);

            return taskCreator.Task;
        }
    }

Here you will imidiately initiate the call and return a Task. If you await the Task in the calling method you will not continue until the callback (DownloadFileCompleted) has occurred.

Note that this method itself is not async as it does not need to await a Task.

Ykok
  • 1,313
  • 13
  • 15
-6

Create a new task that executes the synchronous code. The task will be executed by a thread of the threadpool.

private async Task DownloadBigImage() 
{
      await Task.Run(()=>
      {
            var url = "https://cosmos-magazine.imgix.net/file/spina/photo/14402/180322-Steve-Full.jpg";
            new WebClient().DownloadFile(url, "image.jpg");
      });
}
Solarin
  • 5
  • 1