Downloaded HTML does not contain elements displayed in web browser

Question

I used the following code to download a page's HTML as text:

        string requestUri = "some site";
        string html;

        using (WebClient client = new WebClient())
        {
            html = client.DownloadString(requestUri);
        }

        File.WriteAllText("C:\\html.txt", html);

However the resulting text file does not contain any of the elements that my web browser displays. I searched for any of a series of keywords but none appear in the html, while they appear in my browser and browser's "inspect element" thingy.

As far as I know the downloaded HTML should contain EVERYTHING that is displayed in the browser and more.

Why is the downloaded HTML text missing virtually everything that is displayed in the browser?

You might want to read this: [What is the difference between source code and DOM?](https://stackoverflow.com/questions/29273391/what-is-the-difference-between-source-code-and-dom) — Mitya, Jan 01 '20 at 12:38
@Utkanos, I thought it'd be more of a general coding question, but OK. — mathgenius, Jan 01 '20 at 12:48
Use a WebBrowser **class** to get and render the html. Then you can parse it using the native HtmlDocument methods (GetElemetById, GetElementsByTagName etc.) or pass the document `html` to HtmlAgilityPack if you prefer this parser instead. — Jimi, Jan 01 '20 at 14:29
Also look into [headless browsers](https://en.wikipedia.org/wiki/Headless_browser). Ultimately if you want to get the resultant DOM, not the server-sent source code, you'll need something that can interpret JavaScript as that's the fundamental actor between DOM and source code. — Mitya, Jan 01 '20 at 14:55

score 1 · Answer 1 · answered Jan 01 '20 at 13:04

1

I would strongly suggest HTMLAgility pack for this:

With HtmlAgility, you can simply do this:

string webUrl = "http://microsoft.com";

var page = new HtmlWeb();
var document = page.Load(url);
page.Get(url, "/");
document.Save("test.html");

answered Jan 01 '20 at 13:04

Gauravsa

6,330
2
21
30

While I thank you for your assistance, as this will be of use, it does not answer my question, so I cannot mark it as answer. – mathgenius Jan 01 '20 at 14:02
Wait, I take my last comment back - the operation "`page.Load`" times out, so I can't verify this actually works. – mathgenius Jan 01 '20 at 14:18
After this finally worked - Dammit, this does nothing. From what I see it's the same as my code, but 1000 times slower and requiring a clumsy third party. I wish I could take my vote back, but it's locked, now. :/ – mathgenius Jan 01 '20 at 14:26

Downloaded HTML does not contain elements displayed in web browser

1 Answers1