12

I requested 100 pages that all 404. I wrote

    {
    var s = DateTime.Now;
    for(int i=0; i < 100;i++)
        DL.CheckExist("http://google.com/lol" + i.ToString() + ".jpg");
    var e = DateTime.Now;
    var d = e-s;
        d=d;
        Console.WriteLine(d);
    }

static public bool CheckExist(string url)
{
    HttpWebRequest wreq = null;
    HttpWebResponse wresp = null;
    bool ret = false;

    try
    {
        wreq = (HttpWebRequest)WebRequest.Create(url);
        wreq.KeepAlive = true;
        wreq.Method = "HEAD";
        wresp = (HttpWebResponse)wreq.GetResponse();
        ret = true;
    }
    catch (System.Net.WebException)
    {
    }
    finally
    {
        if (wresp != null)
            wresp.Close();
    }
    return ret;
}

Two runs show it takes 00:00:30.7968750 and 00:00:26.8750000. Then i tried firefox and use the following code

<html>
<body>
<script type="text/javascript">
for(var i=0; i<100; i++)
    document.write("<img src=http://google.com/lol" + i + ".jpg><br>");
</script>

</body>
</html>

Using my comp time and counting it was roughly 4 seconds. 4 seconds is 6.5-7.5faster then my app. I plan to scan through a thousands of files so taking 3.75hours instead of 30mins would be a big problem. How can i make this code faster? I know someone will say firefox caches the images but i want to say 1) it still needs to check the headers from the remote server to see if it has been updated (which is what i want my app to do) 2) I am not receiving the body, my code should only be requesting the header. So, how do i solve this?

Dimitri C.
  • 21,861
  • 21
  • 85
  • 101

8 Answers8

52

I noticed that an HttpWebRequest hangs on the first request. I did some research and what seems to be happening is that the request is configuring or auto-detecting proxies. If you set

request.Proxy = null;

on the web request object, you might be able to avoid an initial delay.

With proxy auto-detect:

using (var response = (HttpWebResponse)request.GetResponse()) //6,956 ms
{
}

Without proxy auto-detect:

request.Proxy = null;
using (var response = (HttpWebResponse)request.GetResponse()) //154 ms
{
}
Orhan Cinar
  • 8,403
  • 2
  • 34
  • 48
Max
  • 6,901
  • 7
  • 46
  • 61
  • When i put this line in my code request.Proxy = null; i was able to get result instantly! Thanx – zidane Aug 27 '10 at 10:30
  • 2
    What are the ramifications if the request needs to go through a Proxy at some client sites? Will it still know it needs to get a Proxy? – TheWommies Dec 12 '11 at 03:56
  • I don't know. It seems Proxy auto-detection is slow (or was, this is more than 2 years old now), and this disables it. My guess is that it won't correctly detect a proxy if you set this flag. – Max Dec 12 '11 at 15:00
  • Here at 1 AM, I was banging my head against the wall, debugging threaded code, trying to figure out what caused some request to take soooo long. And the `Proxy = null` to the rescue! Thanks a billion!!! – Gant Sep 19 '12 at 18:19
  • Sometimes it can help to set Expect100Continue to false, to speed up webrequests if it is supported by the server/service: ```ServicePointManager.Expect100Continue = false;``` https://msdn.microsoft.com/en-us/library/system.net.servicepointmanager.expect100continue%28v=vs.110%29.aspx By default a continue 100 status is sent before every query. By disabling this it can speed up the requests. – juFo Nov 22 '17 at 07:46
  • Will this still use a proxy if it is set on OS level? – XaverB Feb 06 '20 at 12:24
4

change your code to asynchronous getresponse

public override WebResponse GetResponse() {
    •••
    IAsyncResult asyncResult = BeginGetResponse(null, null);
    •••
    return EndGetResponse(asyncResult);
}

Async Get

Srikar Doddi
  • 15,499
  • 15
  • 65
  • 106
  • 2
    Yes but you can do it in 1 line of code now too. WebClient.DownloadSctringAsync http://msdn.microsoft.com/en-us/library/ms144202(VS.80).aspx – Chad Grant Apr 27 '09 at 11:56
  • If you want to implement this not just for downloading, [that may be useful](http://stackoverflow.com/a/12606963/274502) – cregox Sep 26 '12 at 17:12
2

Probably Firefox issues multiple requests at once whereas your code does them one by one. Perhaps adding threads will speed up your program.

Artelius
  • 48,337
  • 13
  • 89
  • 105
  • Good point. Do sites accept more then 3 threads? That would explain why a site may be 3-4times faster but not more then 6.5. hmmm. I'll keep this in mind and try again tonight –  Apr 16 '09 at 00:46
  • So i checked with another app and one site i tested could handle 8 thread. That would explain it. I'll be a little embarrassed if that was the only reason. –  Apr 16 '09 at 00:52
  • Maybe firefox only makes 3 requests at once. – Artelius Apr 16 '09 at 01:00
1

The answer is changing HttpWebRequest/HttpWebResponse to WebRequest/WebResponse only. That fixed the problem.

Alterin
  • 11
  • 1
0

close the response stream when you are done, so in your checkExist(), add wresp.Close() after wresp = (HttpWebResponse)wreq.GetResponse();

Sarvesh
  • 11
0

OK if you are getting status code 404 for all webpages then it is due to not specifying credentials. So you need to add

wreq.Credentials = CredentialCache.DefaultCredentials;

Then you may also come across status code= 500 for that you need to specify User Agent. Which looks something like the below line

wreq.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0) Gecko/20100101 Firefox/4.0";

"A WebClient instance does not send optional HTTP headers by default. If your request requires an optional header, you must add the header to the Headers collection. For example, to retain queries in the response, you must add a user-agent header. Also, servers may return 500 (Internal Server Error) if the user agent header is missing."

reference: https://msdn.microsoft.com/en-us/library/system.net.webclient(v=vs.110).aspx

To improve the Performance of the HttpWebrequest you need to add

wreq.Proxy=null

now the code will look like:

 static public bool CheckExist(string url)
{
    HttpWebRequest wreq = null;
    HttpWebResponse wresp = null;
    bool ret = false;

try
{
    wreq = (HttpWebRequest)WebRequest.Create(url);
    wreq.Credentials = CredentialCache.DefaultCredentials;
    wreq.Proxy=null;
    wreq.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0) Gecko/20100101 Firefox/4.0";
    wreq.KeepAlive = true;
    wreq.Method = "HEAD";
    wresp = (HttpWebResponse)wreq.GetResponse();
    ret = true;
}
catch (System.Net.WebException)
{
}
finally
{
    if (wresp != null)
        wresp.Close();
}
return ret;

}

Tejaswi Pandava
  • 486
  • 5
  • 17
0

Set cookie is matter and you must add AspxAutoDetectCookieSupport=1 like this code

 req.CookieContainer = new CookieContainer();         
 req.CookieContainer.Add(new Cookie("AspxAutoDetectCookieSupport", "1") { Domain = target.Host });
ashkufaraz
  • 5,179
  • 6
  • 51
  • 82
0

Have you tried opening the same URL in IE on the machine that your code is deployed to? If it is a Windows Server machine then sometimes it's because the url you're requesting is not in IE's (which HttpWebRequest works off) list of secure sites. You'll just need to add it.

Do you have more info you could post? I've doing something similar and have run into tons of problems with HttpWebRequest before. All unique. So more info would help.

BTW, calling it using the async methods won't really help in this case. It doesn't shorten the download time. It just doesn't block your calling thread that's all.

Fung
  • 7,530
  • 7
  • 53
  • 68
  • I tried with IE6 and it takes roughly 5 seconds. My code using = wreq.Method = "HEAD"; takes 12.5. I'll assume its bc its using 2 threads. That data looks close enough. –  Apr 27 '09 at 12:08
  • Noticed that you mentioned that your requesting pages that 404. Haven't done that on purpose before but WebRequest behaviour might be different for such cases. Something worth looking into. Does it take the normal 4 secs if you request for an existing page using a GET? – Fung Apr 28 '09 at 01:30