0

I'm trying to retrieve the source of a site with WebClient.DownloadString, but when i debug the string I'm writing the source to it seems to cut off a part of the html source.

Text visualiser in VS: https://i.imgur.com/AWiTTqI.png

Browser debug: enter image description here

Code:

public string GetWebpageSource()
{
    using (WebClient client = new WebClient())
    {
        client.Headers[HttpRequestHeader.UserAgent] = "Mozilla / 5.0(Windows NT 10.0; Win64; x64; rv: 44.0) Gecko / 20100101 Firefox / 44.0";
        client.Encoding = Encoding.UTF8;
        string htmlcode = client.DownloadString("http://2007.runescape.wikia.com/wiki/Bandos%20page%201");
        return htmlcode;
    }
}

So I'm wondering why it does that? If there's additional info needed, I will post it. Thanks for reading!

Denny
  • 1,766
  • 3
  • 17
  • 37
  • This might help a bit [stackoverflow](http://stackoverflow.com/questions/19577586/webclient-downloadstring-not-returning-anything) – Isuru Feb 08 '16 at 15:24
  • 1
    I wonder if this isn't an optimization in the visualizer. Opening unbounded lengths of text can be an unexpectedly greedy operation. – spender Feb 08 '16 at 15:25
  • 1
    @Isuru I got useragent headers in my request – Denny Feb 08 '16 at 15:26
  • 1
    @spender Pretty sure it isn't, I'm trying to obtain a part of the html code with regex and it can't find it, while it exists in the browser html code – Denny Feb 08 '16 at 15:27
  • Try saving text to file and viewing it instead of text visualizer. Regexes can have problems because of line breakes. – Qwertiy Feb 08 '16 at 15:28
  • Possibly [this know bug](https://connect.microsoft.com/VisualStudio/feedback/details/2016177/text-visualizer-misses-corrupts-text-in-long-strings), File.WriteAllText to verify. Also FYI Wikia has an API for data retrieval. RegEx + HTML is rarely the best choice. – Alex K. Feb 08 '16 at 15:29
  • I can't reproduce this in LinqPad. @AlexK. is right on the money. Visualizer issue. – spender Feb 08 '16 at 15:31
  • So writing the the content to a file works. But is it really necessary to write it to a file first? And as for the API, I've already searched the API, but couldn't find one to retrieve the data in the wiki page self. – Denny Feb 08 '16 at 15:37
  • 1
    @Denny You misunderstand. Your data is fine, and it is the debugger's visualizer that has lead you to the incorrect assumption that WebClient is broken. There is no problem with the downloading of the string. Writing it to file proves that this is the case. Now you need to figure out why your regex fails. It's not because of the reason you thought it was. – spender Feb 08 '16 at 15:40
  • Writing to a file just allowed you to verify `htmlcode` is correct and the text visualizer is wrong; you don't need to to it every time, go ahead and use the string. – Alex K. Feb 08 '16 at 15:40

1 Answers1

3

Thanks to people from SO I've found the 'problem'. The text visualiser in VS gave me an indication that the text was cut off, but this was not the problem when writing the source to a file. So I thought it did not download the whole page because the text in the text visualiser. So the lession I've learned is do NOT trust the text visualiser!

By further debugging from the text file I could solve my problems :)

Denny
  • 1,766
  • 3
  • 17
  • 37