1

I cannot use google api for this project, but need to make a simple google query, I do this by using ssl3 and tls12 on WebClient, setting the header manually (am not sure if this helps) and simply sending a GET request, for some reason, this takes 10 seconds, but one to StackOverflow takes a mere 3 seconds. Yet when using chrome both load instantly, what is the bottleneck on using WebClient? How can I get SSL GET requests as fast as chrome does?

Second question: If the page contains JS, how can one execute the js on the "document" retrieved without using a web browser and rendering the whole thing

Any help appreciated.

EDIT: removing the header modifying code speeds it up, but google is still incredibly slow, I'm presuming they do this intentionally? Is there any way around this?

//in main
  WebCrawler wc = new WebCrawler();
            string page = wc.load("https://stackoverflow.com/questions/20064505/requesting-html-over-https-with-c-sharp-webclient");
            page = wc.load("https://www.google.com/maps?q=computer+shops+near+me&rlz=1C1GCEA_enZA855ZA855&um=1&ie=UTF-8&sa=X&ved=0ahUKEwi1lY-c4eDjAhUtWhUIHf8DDKUQ_AUIEigB");

...
// webcrawler class
WebClient webClient;
        public WebCrawler()
        {

            webClient = new WebClient();
            ServicePointManager.ServerCertificateValidationCallback += ValidateRemoteCertificate;
            ServicePointManager.SecurityProtocol = SecurityProtocolType.Ssl3;
            ServicePointManager.Expect100Continue = true;
            ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12;



        }
        public  string load(string uri)
        {
            Uri address = new Uri(uri);

            {
                webClient.Headers.Set(HttpRequestHeader.UserAgent, "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36");
                webClient.Headers.Set(HttpRequestHeader.Referer, "https://www.google.com/");
             //    webClient.Headers.Set(HttpRequestHeader.Cookie,
                var stream = webClient.OpenRead(address);
                using (StreamReader sr = new StreamReader(stream))
                {
                    var page = sr.ReadToEnd();
                    return page;
                }
            }
        }
        private static bool ValidateRemoteCertificate(object sender, X509Certificate cert, X509Chain chain, SslPolicyErrors error)
        {
            if (error == System.Net.Security.SslPolicyErrors.None)
            {
                return true;
            }

            Console.WriteLine("X509Certificate [{0}] Policy Error: '{1}'",
                cert.Subject,
                error.ToString());

            return false;
        }
    }
Chandresh Khambhayata
  • 1,748
  • 2
  • 31
  • 60
BinkyNichols
  • 586
  • 4
  • 14

1 Answers1

1

Don't use WebClient. Instead, you may use HttpClient or HttpWebRequest and set AutomaticDecompression to GZip, Deflate.

When you set AutomaticDecompression to GZip, deflate using the following line (where req is HttpWebRequest for example):

req.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;

An HTTP header called Accept-Encoding will be sent to the server with the value GZip, Deflate that asks the server to download the content in a compressed format. Which means, a smaller size content to be downloaded and also a lower time required. The HttpWebRequest will take care of decompressing the data sent from the server.

Same concept I explained on HttpWebRequest can be applied to HttpClient

Youssef13
  • 3,836
  • 3
  • 24
  • 41
  • Does this support https? – BinkyNichols Aug 01 '19 at 12:19
  • @BinkyNichols, yes. You can use `HttpWebRequest` or `HttpClient` and set `AutomaticDecompression` with https requests. – Youssef13 Aug 01 '19 at 12:40
  • Doesn't improve time at all, Issue is, when I load same page in a WebBrowser Control, it's about 1 second, yet 10 seconds in pure GET request. – BinkyNichols Aug 01 '19 at 13:33
  • Were you able to get this sped up? I implemented the following two settings: `ServicePointManager.Expect100Continue = true;` `ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12;` and my webclient speed dropped a couple of seconds from what it was. But I have to declare TLS1.2 or else it attempts to use something else and the web request fails. – Joseph Michael Oct 31 '19 at 18:49