5

EDIT: added reproduction samples + I am running this (on all servers) on Ubuntu 18.04 with .Net Core 2.2.203.

EDIT: tested from home from my Windows 10 laptop; same results

I have a piece of very simple code for HttpClient (static as recommended, but I tested with using() as well):

sw.Start(); // stopwatch
client.GetAsync(url).Result();
sw.Stop();

and then for curl:

time curl -L "url" > /dev/null

and for lynx:

time lynx "url" > /dev/null

The difference is staggering; it really depends on the requested server/url, but i'm getting differences to 2-50x slower from HttpClient than curl/lynx on requests from the same server.

I tried all fixes I could find;

HttpHandler without proxy (UseProxy = false, Proxy = null)

Using await instead of .Result (not that that should make a difference and it indeed does not)

WebClient

ModernHttpClient

and the Curl wrapper CurlThin

That last option (obviously) did give the right results, the rest (.NET options) are just incredibly slow.

For now i'm using the Curl wrappers because the .NET results are just incorrect and slowing our stack down.

Did anyone have this before? I tried (as you can see above) all 'fixes' provided by Googling, but none of them provided any help.

EDIT: from Matthiee in the comments, if you are running Windows with Powershell, this reproduces it too;

(Measure-Command -Expression { $site = Invoke-WebRequest -Uri "reddit.com" }).Milliseconds

EDIT: Code to reproduce:

use with:

dotnet run -- https://reddit.com
using System;
using System.Diagnostics;
using System.Net.Http;

namespace Download.Playground
{
    class Program
    {
        static HttpClient client;

        
        static void Main(string[] args)
        {
        
            HttpClientHandler hch = new HttpClientHandler();
            hch.Proxy = null;
            hch.UseProxy = false;
            client = new HttpClient(hch);
       
            Stopwatch sw = new Stopwatch();

            sw.Start();
            var result = client.GetAsync(args[0]).Result; 
            sw.Stop();

            Console.WriteLine($"Spent {sw.ElapsedMilliseconds}ms"); 

        }
        
    }
}

Little script to check 20 times, run with:

./runbench https://reddit.com
#!/bin/bash

for i in {1..20}
do
    dotnet run -- $1
    time curl -L $1 > /dev/null
done
Community
  • 1
  • 1
CharlesS
  • 1,563
  • 2
  • 18
  • 31
  • Does this also happen using the sync methods? Please provide an url that would allow us to reproduce the difference. – Matthiee May 09 '19 at 09:21
  • Yes, it happens with DownloadString() for instance as well. Reddit.com is a good example; on HttpClient etc I am getting ~1800ms while curl gives me < 200ms always. Again, measured from the same (gigabit pipe) server. No matter the exact timing values; HttpClient for https://reddit.com for that server (and yes, I tried different servers; same result) delivers 5-10x slower results over 30 requests for both curl and httpclient. – CharlesS May 09 '19 at 09:30
  • For me it is also x5 slower.. I used this to measure the time ` (Measure-Command -Expression { $site = Invoke-WebRequest -Uri "https://reddit.com" }).Milliseconds ` – Matthiee May 09 '19 at 09:53
  • Maybe this is the issue? https://github.com/dotnet/corefx/issues/37035 ; All Linux machines I'm on have ipv6. Let me test my laptop by switching off. => nope, that's not it. – CharlesS May 09 '19 at 09:57
  • Did you try to connect more than once using same HttpClient? Perhaps there is some initialization that takes so long time, though 1600 ms difference seems too much for any initialization code – Ilya Chernomordik May 09 '19 at 10:20
  • This code runs inside software that reuses the HttpClient. The result is exactly the same unfortunately. We got wind of this issue because some users complained about their pages loading very slow (this is used for an internal tool) while other page load very fast. All have the same Curl load time, but very different HttpClient load times. – CharlesS May 09 '19 at 10:25
  • I tried to access reddit.com in browser, and it was ~1 second, you sure you get under 200ms in curl for that? Though of course it depends on where the server is, etc. – Ilya Chernomordik May 09 '19 at 10:27
  • Did run this in Powershell and get everything from 50ms to 1sec in there... – Ilya Chernomordik May 09 '19 at 10:33
  • It probably depends on the location but I get, and Matthiee as well, some urls with a 5-50x difference from the same connection/machine. Probably you can find URLs from your location that have the same discrepancy. – CharlesS May 09 '19 at 10:36
  • @IlyaChernomordik for me using WebClient from code was ~5x slower than powershell – Matthiee May 09 '19 at 10:38
  • I assume there was no proxy/interception that works only for dotnet application involved, and you did not run that under debug either? (I guess debug vs release won't do much, but you can try to compile it in release and check) – Ilya Chernomordik May 09 '19 at 10:56
  • @CharlesS try this https://stackoverflow.com/a/4914874/6058174 it is not as fast but a lot faster, at least for me. – Matthiee May 09 '19 at 10:59
  • @IlyaChernomordik I switched off proxy in HttpClient and the networking is the same ( same server ). The end user software is compiled in Release and I the issue is the same. – CharlesS May 09 '19 at 11:04
  • @Charles did you *check* the results? Trying your code returns .... `we're sorry, but you appear to be a bot and we've seen too many requests from you lately.`. Reddit accepts only 1 request every 2 seconds which means you're probably measuring the instantaneous rejection response instead of an actual response – Panagiotis Kanavos May 09 '19 at 11:13
  • @PanagiotisKanavos Reddit is just one example, there are many other sites that ; and that bot results also comes in much slower via httpclient than curl for me. – CharlesS May 09 '19 at 11:15
  • @CharleS and that example returns a static page to `curl`. Perhaps the others do too. Did you actually check the response? – Panagiotis Kanavos May 09 '19 at 11:17
  • @PanagiotisKanavos I did (and verified again as you made me doubt); the results are the same for both requests content / header wise, just timing wise they are not. – CharlesS May 09 '19 at 11:25
  • @CharlesS an HTTP library can't be slow for *some* servers but not others. Perhaps the response was rejected. There's no user agent string in the `curl` call which definitely affects who servers respond. – Panagiotis Kanavos May 09 '19 at 11:25
  • @CharlesS I on the other hand, can confirm they are not. `curl -L "reddit.com"` returns a rejection every time, explaining that there's a rate limit and the user agent must be filled. On the other hand `Invoke-WebRequest -Uri "reddit.com"` returns a proper response – Panagiotis Kanavos May 09 '19 at 11:29
  • @PanagiotisKanavos I don't know what to tell you ; curl -L https://reddit.com gives me the reddit home page. It some times gives me that error but on times it gives me the home page curl is much faster than httpclient. However ,let's not stare blind on reddit, let me find a few other sites I have the issue with. – CharlesS May 09 '19 at 11:38
  • Thanks to @PanagiotisKanavos (and the rest for being patient) I managed to properly diagnose the problem; it indeed was static vs dynamic but my target was not Reddit (that was just an example it happened to as well, but probably, even though I do get the homepage like I said and it is a lot slower than Curl but not as extreme, so that's I think then just 'luck'). The sites that I do target with the tool I cannot divulge as they are internal, but when I started checking, looking for a URL to pass to PagagiotisKanavos I found that all the slow ones are Wordpress. – CharlesS May 09 '19 at 12:38
  • So then I looked closely to the content and I saw that when sending content with HttpClient, WP Cache is disabled! While sending with Curl, WP Cache is working. So it turns out (which is really crappy IMHO) that when you do not add a user-agent ,WP Cache does not cache... Thanks for all the help ! – CharlesS May 09 '19 at 12:39
  • @CharlesS I found another interesting thing too - reddit detects your locale and returns *different* content based on the headers. On Windows, `Invoke_WebRequest` gets its defaults from the `Internet Options` so I got Greek results while Chrome, which I've set to US like all developers do, returned US content. And curl called from WSL returned the rejection page – Panagiotis Kanavos May 09 '19 at 12:46
  • @CharlesS check [this question too](https://stackoverflow.com/questions/56057748/webclient-downloadstring-being-blocked-by-some-websites-and-responds-back-as-det?noredirect=1#comment98757783_56057748). In this case, the server goes out of its way to detect bots and returns a canned robot response, even popping up a human verification page if the same IP makes too many requests. Sending the UA string isn't enough to convince it the caller is human – Panagiotis Kanavos May 09 '19 at 12:47

1 Answers1

0

The issues was resolved, just a combination of factors that led up to a large portion of the target audience sites to not have their content cached. Nothing to do with HttpClient (Besides it not sending a user-agent).

Read the comments for more information.

CharlesS
  • 1,563
  • 2
  • 18
  • 31