-3

I used the following code to download a page's HTML as text:

        string requestUri = "some site";
        string html;

        using (WebClient client = new WebClient())
        {
            html = client.DownloadString(requestUri);
        }

        File.WriteAllText("C:\\html.txt", html);

However the resulting text file does not contain any of the elements that my web browser displays. I searched for any of a series of keywords but none appear in the html, while they appear in my browser and browser's "inspect element" thingy.

As far as I know the downloaded HTML should contain EVERYTHING that is displayed in the browser and more.

Why is the downloaded HTML text missing virtually everything that is displayed in the browser?

mathgenius
  • 503
  • 1
  • 6
  • 21
  • Please tag the language you're using. – Mitya Jan 01 '20 at 12:38
  • 1
    You might want to read this: [What is the difference between source code and DOM?](https://stackoverflow.com/questions/29273391/what-is-the-difference-between-source-code-and-dom) – Mitya Jan 01 '20 at 12:38
  • @Utkanos, I thought it'd be more of a general coding question, but OK. – mathgenius Jan 01 '20 at 12:48
  • @HereticMonkey, yes, it does. – mathgenius Jan 01 '20 at 14:11
  • @Utkanos, thanks that answered my question. – mathgenius Jan 01 '20 at 14:11
  • Use a WebBrowser **class** to get and render the html. Then you can parse it using the native HtmlDocument methods (GetElemetById, GetElementsByTagName etc.) or pass the document `html` to HtmlAgilityPack if you prefer this parser instead. – Jimi Jan 01 '20 at 14:29
  • Also look into [headless browsers](https://en.wikipedia.org/wiki/Headless_browser). Ultimately if you want to get the resultant DOM, not the server-sent source code, you'll need something that can interpret JavaScript as that's the fundamental actor between DOM and source code. – Mitya Jan 01 '20 at 14:55

1 Answers1

1

I would strongly suggest HTMLAgility pack for this:

With HtmlAgility, you can simply do this:

string webUrl = "http://microsoft.com";

var page = new HtmlWeb();
var document = page.Load(url);
page.Get(url, "/");
document.Save("test.html");
Gauravsa
  • 6,330
  • 2
  • 21
  • 30
  • While I thank you for your assistance, as this will be of use, it does not answer my question, so I cannot mark it as answer. – mathgenius Jan 01 '20 at 14:02
  • Wait, I take my last comment back - the operation "`page.Load`" times out, so I can't verify this actually works. – mathgenius Jan 01 '20 at 14:18
  • After this finally worked - Dammit, this does nothing. From what I see it's the same as my code, but 1000 times slower and requiring a clumsy third party. I wish I could take my vote back, but it's locked, now. :/ – mathgenius Jan 01 '20 at 14:26