0

In my app I need to do lot of parallel http requests and I have read that it is proper to do it using async/await. In each request I need to get string content from it (often it is html of some site) and my question is: how can I do it in best way?

My current implementation:

public static async Task<string> GetStringContentAsync(HttpWebRequest webRequest) 
     { 
         try 
         { 
            using (var response = (HttpWebResponse) await webRequest.GetResponseAsync() 
                                                                    .ConfigureAwait(false)) 
           { 
                 var content = await GetStringContentFromResponseAsync(response) 
                                    .ConfigureAwait(false); 
              return content; 
           } 
         } 
         catch (Exception exception) 
         { 
            return null; 
         } 
     } 


private static async Task<string> GetStringContentFromResponseAsync(HttpWebResponse response) 
    { 
       using (var responseStream = GetResponseStream(response)) 
        { 
           if (responseStream == null) 
               return null; 
           using (var streamReader = new StreamReader(responseStream)) 
           { 
               var content = await streamReader.ReadToEndAsync() 
                                               .ConfigureAwait(false); 
               return content; 
           } 
        } 
    } 

private static Stream GetResponseStream(HttpWebResponse webResponse) 
    { 
        var responseStream = webResponse.GetResponseStream(); 
        if (responseStream == null) 
          return null; 

       Stream stream; 
       switch (webResponse.ContentEncoding.ToUpperInvariant()) 
       { 
           case "GZIP": 
                stream = new GZipStream(responseStream, CompressionMode.Decompress); 
                break; 
           case "DEFLATE": 
               stream = new DeflateStream(responseStream, CompressionMode.Decompress); 
                break; 
           default: 
               stream = responseStream; 
               break; 
       }
        return stream; 
    } 

And example of using:

var httpWebRequest = (HttpWebRequest) WebRequest.Create("http://stackoverflow.com/");
var content = await HttpHelper.GetStringContentAsync(httpWebRequest) 
                                                              .ConfigureAwait(false); 

Is this correct implementation, or we can improve something here? Maybe I'm doing some overhead when using async/await when reading stream?

Reason of my question is that when I'm using my code like this:

for(var i=0;i<1000;i++)
{
  Task.Run(()=>{
    var httpWebRequest = (HttpWebRequest) WebRequest.Create("http://google.com/");
    var content = await HttpHelper.GetStringContentAsync(httpWebRequest) 
                                  .ConfigureAwait(false); 
               });
}

this tasks take to long to execute, but one request to google is very fast. I thought that async requests in this example must be ready almost in same time and this time must be pretty close to "one google request" time.

EDIT: I forgot to say that I know about ServicePointManager.DefaultConnectionLimit and set it 5000 in my app. So it is not a problem. I can't use HttpClient because my final goal is to do 100-300 requests at one time from different proxies. And if I understand right, HttpClient can work with only one proxy at one time and can't setup each request separately.

  • 1
    Simplify your code by using AutomaticDecompression. – usr Mar 26 '15 at 09:19
  • You mean that I can remove my GetResponceStream method and use smth like this: request.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate? – Юрий Бржозовский Mar 26 '15 at 09:30
  • 1
    All of your code seems unnecessary. Downloading a URL usually takes 1-2 lines with HttpClient or WebClient. – usr Mar 26 '15 at 09:55
  • I need to use HttpWebRequest because I need to configure it with many parameters like proxy and cookies. So easy way with httpClient is not mine. – Юрий Бржозовский Mar 26 '15 at 10:00
  • 1
    @ЮрийБржозовский: `HttpClient` has [full support for proxies and cookies](https://msdn.microsoft.com/en-us/library/system.net.http.httpclienthandler%28v=vs.118%29.aspx). Your timing problem is probably due to `ServicePointManager.DefaultConnectionLimit` and/or throttling by Google. – Stephen Cleary Mar 26 '15 at 12:36
  • I forgot to say that I know about DervicePointManager.DefaultConnectionLimit and set it 5000 in my app. So it is not a problem. What about my methods? Is all ok with them? About HttpClient: my goal is to do 100-300 requests at one time from different proxies. And if I understand right, HttpClient can work with one proxy at one time and can't setup each request separately. – Юрий Бржозовский Mar 26 '15 at 13:01

2 Answers2

0

That's a tricky one. Since you know about DefaultConnectionLimit, it's already something good, but there is one more interesting and rather surprising thing:

  httpRequest.ServicePoint.ConnectionLeaseTimeout

  httpRequest.ServicePoint.MaxIdleTime

Information is here, your latencies might be caused by its default behavior and connections being held to ServicePoint while trying to make next request

maiksaray
  • 358
  • 2
  • 14
  • Big thanks for your answer. I have one more comment to my situation: when i'm doing 100-200 async httprequests in parallel (using Task.Run) they use alot of time to finish. But I thought that they must finish as fast as one request, because they all are async and didn't require many recourses (usage of my CPU and network was all time less than 4%). So my computer isn't a bottleneck and there are a lot of recourses that requests can use for there better performance. So why they can be so slow in that situation? May it be because with timeouts is something wrong? Thank you. – Юрий Бржозовский Mar 27 '15 at 13:01
0

Here's the answer answer to your issue: https://msdn.microsoft.com/en-us/library/86wf6409(v=vs.90).aspx

Using synchronous calls in asynchronous callback methods can result in severe performance penalties. Internet requests made with WebRequest and its descendants must use Stream.BeginRead to read the stream returned by the WebResponse.GetResponseStream method.

That means absolutely no synchronous code (including awaits) when reading the response stream. But even that isn't enough, as DNS lookups and TCP connection are still blocking. If you can use .NET 4.0, there's a much more easy to use System.Net.Http.HttpClient class. Otherwise, you can use System.Threading.ThreadPool, which is the workaround I ended up using on 3.5:

ThreadPool.QueueUserWorkItem((o) => {
    // make a synchronous request via HttpWebRequest
});
Community
  • 1
  • 1
blade
  • 12,057
  • 7
  • 37
  • 38