10

So I've been digging up on the implementation of HttpClient.SendAsync via Reflector. What I intentionally wanted to find out was the flow of execution of these methods, and to determine which API gets called to execute the asynchronous IO work.

After exploring the various classes inside HttpClient, I saw that internally it uses HttpClientHandler which derives from HttpMessageHandler and implements its SendAsync method.

This is the implementation of HttpClientHandler.SendAsync:

protected internal override Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
{
    if (request == null)
    {
        throw new ArgumentNullException("request", SR.net_http_handler_norequest);
    }

    this.CheckDisposed();
    this.SetOperationStarted();

    TaskCompletionSource<HttpResponseMessage> source = new TaskCompletionSource<HttpResponseMessage>();

    RequestState state = new RequestState 
    {
        tcs = source,
        cancellationToken = cancellationToken,
        requestMessage = request
    };

    try
    {
        HttpWebRequest request2 = this.CreateAndPrepareWebRequest(request);
        state.webRequest = request2;
        cancellationToken.Register(onCancel, request2);

        if (ExecutionContext.IsFlowSuppressed())
        {
            IWebProxy proxy = null;

            if (this.useProxy)
            {
                proxy = this.proxy ?? WebRequest.DefaultWebProxy;
            }
            if ((this.UseDefaultCredentials || (this.Credentials != null)) || ((proxy != null) && (proxy.Credentials != null)))
            {
                this.SafeCaptureIdenity(state);
            }
        }

        Task.Factory.StartNew(this.startRequest, state);
    }
    catch (Exception exception)
    {
        this.HandleAsyncException(state, exception);
    }
    return source.Task;
}

What I found weird is that the above uses Task.Factory.StartNew to execute the request while generating a TaskCompletionSource<HttpResponseMessage> and returning the Task created by it.

Why do I find this weird? well, we go on alot about how I/O bound async operations have no need for extra threads behind the scenes, and how its all about overlapped IO.

Why is this using Task.Factory.StartNew to fire an async I/O operation? this means that SendAsync isn't only using pure async control flow to execute this method, but spinning a ThreadPool thread "behind our back" to execute its work.

Yuval Itzchakov
  • 146,575
  • 32
  • 257
  • 321

1 Answers1

13

this.startRequest is a delegate that points to StartRequest which in turn uses HttpWebRequest.BeginGetResponse to start async IO. HttpClient is using async IO under the covers, just wrapped in a thread-pool Task.

That said, note the following comment in SendAsync

// BeginGetResponse/BeginGetRequestStream have a lot of setup work to do before becoming async
// (proxy, dns, connection pooling, etc).  Run these on a separate thread.
// Do not provide a cancellation token; if this helper task could be canceled before starting then 
// nobody would complete the tcs.
Task.Factory.StartNew(startRequest, state);

This works around a well-known problem with HttpWebRequest: Some of its processing stages are synchronous. That is a flaw in that API. HttpClient is avoiding blocking by moving that DNS work to the thread-pool.

Is that good or bad? It is good because it makes HttpClient non-blocking and suitable for use in a UI. It is bad because we are now using a thread for long-running blocking work although we expected to not use threads at all. This reduces the benefits of using async IO.

Actually, this is a nice example of mixing sync and async IO. There is nothing inherently wrong with using both. HttpClient and HttpWebRequest are using async IO for long-running blocking work (the HTTP request). They are using threads for short-running work (DNS, ...). That's not a bad pattern in general. We are avoiding most blocking and we only have to make a small part of the code async. A typical 80-20 trade-off. It is not good to find such things in the BCL (a library) but in application level code that can be a very smart trade-off.

It seems it would have been preferable to fix HttpWebRequest. Maybe that is not possible for compatibility reasons.

Community
  • 1
  • 1
usr
  • 168,620
  • 35
  • 240
  • 369
  • Holy crap! Thanks. I was looking at http://sourceof.net and looking for that method implementation and it doesn't show any, only method signatures. That is indeed a shame! I wonder if anyone from MSFT can comment on that. – Yuval Itzchakov Jul 27 '14 at 17:53
  • It seems a pretty small implementation details but this is rather big deal, isn't it? if im using `HttpClient.XXXAsync` methods to query many requests all-together, im not expecting each of them to internally use a ThreadPool thread. This isn't documented anywhere. – Yuval Itzchakov Jul 27 '14 at 17:57
  • @YuvalItzchakov yes, I'm disappointed. This is below the usual quality standards that we are used to when using the BCL. – usr Jul 27 '14 at 18:02
  • That said, don't fear the thread. Synchronous IO isn't that harmful and there is an irrational fear of it going on right now. It wouldn't really cause any damage to even have 100s of threads dedicated to DNS resolution. The main problem with that is the 1MB of stack size per thread. That's mostly it. – usr Jul 27 '14 at 18:02
  • The 1MB of stack is allocated anyway when the ThreadPool initializes, doesn't it? Assuming it isn't out the default allocated number of threads. – Yuval Itzchakov Jul 27 '14 at 18:06
  • 1MB per thread. The memory usage depends on how many threads are spawned. If DNS is really quick, like it is most of the time, a few threads can process all of the load. – usr Jul 27 '14 at 18:09
  • It isn't only 1MB stack, its the Thread Local Storage, Thread Environment Block, Kernel mode Stack, etc. – Yuval Itzchakov Jul 27 '14 at 19:39
  • @YuvalItzchakov true, yet that 1MB is most of it. I just started 100k threads on my machine as a test using testlimits.exe. Threads are really not that big of a deal. 100k threads ought to be enough for anybody! :) – usr Jul 27 '14 at 20:03
  • @usr Synchronous IO can be harmful in async code if the sync calls block for long. It can cause the task system to stall as it is a cooperative system. It does have some recovery logic that if stalls are for too long, a new thread-pool thread will start processing tasks, which will help hide the problem. Too many threads have negative scaling when CPU bound. In some extreme situations, you can become context-switching bound, showing low throughput and low CPU usage. – Bengie Apr 11 '17 at 14:06
  • @Bengie this problem exists but it does not relate to async IO in any way. That's just classic thread pool exhaustion (which must be strictly avoided as you say). Whether to use sync or async IO is a cost-benefit tradeoff (in code path not on the UI thread). The main cost for async is dev time. There's no strict rule to always use either of them. – usr Apr 17 '17 at 11:24