0

I'm trying to download the HTML source code in the same form as is shown when right-clicking and selecting the option on a website. So far I tried:

using System.Net;

using (WebClient client = new WebClient ()) // WebClient class inherits IDisposable
{
    client.DownloadFile("http://yoursite.com/page.html", @"C:\localfile.html");

    // Or you can get the file content without saving it
    string htmlCode = client.DownloadString("http://yoursite.com/page.html");
}

and

string url = "page url";
HttpClient client = new HttpClient();
using (HttpResponseMessage response = client.GetAsync(url).Result)
{
   using (HttpContent content = response.Content)
   {
      string result = content.ReadAsStringAsync().Result;
   }
}

as suggested here.

The issues I am facing is that the source code in the first version did not contain the information I needed (which is available in the Chrome source code version). I read something about that what I am actually downloading is DOM, meaning some links and content may be rewritten unless I have misunderstood.

While trying the second method I received a System.AggregateException at string result = content.ReadAsStringAsync().Result;, so even if that solution would have worked I do not know how to debug it.

The plan now is to read up on the HttpClient class and meanwhile see if someone here knows the solution right away. I followed the same syntax as shown in an answer in the link and the compiler did not react to any typos. The only difference was that I pasted an actual URL.

Amogh
  • 53
  • 4
  • 1
    I think the problem stems from sites using things like React which renders the content not in the html, but using javascript on the clients browser. You would likely have to use a library to resolve this. – Kyle DePace Aug 08 '20 at 23:26
  • @KyleDePace that's really good to know. When I try to download the source code manually I see there are three different options. Either I download the "website complete code" (not sure about the translation I don't have the options in english), "just HTML" or as a "single file". The "just HTML" does not contain the information I need, which I asume if because of the reason you just told me. The other two do though. – Arvid Norinder Aug 09 '20 at 00:08
  • You can't do it as shown above but the browser itself can. Here's the idea, you need to spin up a browser from C#, then inject the JavaScript to do the job. C# then gets called back when done. Problem is that was the IE way. Now they are embedding Chrome, and I never studied C# and Chrome interop. – JWP Aug 09 '20 at 02:22
  • @JohnPeters that's a great idea regardless, I'll look into how I could do the same thing in Chrome but so far I havn't learned javascript. Time to change that then. – Arvid Norinder Aug 09 '20 at 16:40

0 Answers0