2

I have a method which returns the content of a webpage:

        private string FetchHTML(string sUrl, Encoding encoding)
        {
            System.Net.WebClient oClient = new System.Net.WebClient();
            oClient.Encoding = encoding;
            return System.Web.HttpUtility.HtmlDecode(oClient.DownloadString(sUrl));
        }

But when I try to load a link from livejournal (for instance, http://mos-jkh.livejournal.com/769579.html) then I am getting this exception at DownloadString:

The request was aborted: The operation has timed out.

Is it a known issue? Why doesn't DownloadString work for some webpages and is there a solution for this? Or is there an alternative to DownloadString?

Zoltan Kochan
  • 5,180
  • 29
  • 38
  • Can you give an example of a URL that times out besides the link you've posted? Perhaps something less reputable than livejournal? It would also help to see the specific code you're using to call `FetchHTML` (maybe with any variables replaced with the values they represent). – M.Babcock Jan 11 '12 at 05:24

2 Answers2

8

Some websites are smart enough to check whether the request is made by a browser or not. And when they detect that the request was done not with a browser they don't respond. But it's easy to fool them by simply sending the user agent info with the request. So the solution was adding one single line of code to the FetchHTML method:

    private string FetchHTML(string sUrl, Encoding encoding)
    {
        System.Net.WebClient oClient = new System.Net.WebClient();
        oClient.Encoding = encoding;
        // set the user agent to IE6
        oClient.Headers.Add("user-agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705;)");
        return System.Web.HttpUtility.HtmlDecode(oClient.DownloadString(sUrl));
    }

PS: To detect the issue I was using Fiddler instead of Wireshark which I've found too complex.

Zoltan Kochan
  • 5,180
  • 29
  • 38
1

Well, the exception says that the operation timed out. That seems to be a pretty reasonable thing to happen some times - there can be slow servers, slow internet connections etc - and if you're trying to download multiple pages from the same host, that will use connection pooling, which can lead to this occurring even when each individual request looks okay.

Use something like Wireshark to work out what's going on at the network level.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194