How can I download HTML source in C#

Question

How can I get the HTML source for a given web address in C#?

score 198 · Accepted Answer · edited Mar 10 '23 at 07:36

198

You can download files with the WebClient class:

using System.Net;

using (WebClient client = new WebClient ()) // WebClient class inherits IDisposable
{
    client.DownloadFile("http://yoursite.com/page.html", @"C:\localfile.html");

    // Or you can get the file content without saving it
    string htmlCode = client.DownloadString("http://yoursite.com/page.html");
}

edited Mar 10 '23 at 07:36

Hakan Fıstık

16,800
14
110
131

answered Mar 01 '09 at 05:03

Christian C. Salvadó

807,428
183
922
838

Should note: if more control is needed, look at the HttpWebRequest class (e.g. being able to specify authentication). – Richard Mar 01 '09 at 15:12
1

Yes, HttpWebRequest gives you more control, although you can do POST requests with WebClient, using client.UploadData(uriString,"POST",postParamsByteArray); – Christian C. Salvadó Mar 01 '09 at 17:51
1

Wouldn't it be prudent to catch WebException's around this? Maybe that was assumed. Any other exceptions or errors need to be caught with this method? – John Washam Feb 21 '14 at 21:50
4

@JohnWasham - yes, it would be prudent to catch exceptions here. Thankfully however, most StackOverflow respondents keep example code as clear and concise as possible. Making example code closer to "real life" would just add noise. – Chris Rogers Mar 04 '15 at 02:49
Issue i face is that when i download pagesource and get data than if that website is in other language than my pagesource is not getting those values – Rush.2707 Dec 16 '16 at 09:31

score 42 · Answer 2 · edited Sep 01 '21 at 00:42

42

Basically:

using System.Net;
using System.Net.Http;  // in LINQPad, also add a reference to System.Net.Http.dll

WebRequest req = HttpWebRequest.Create("http://google.com");
req.Method = "GET";

string source;
using (StreamReader reader = new StreamReader(req.GetResponse().GetResponseStream()))
{
    source = reader.ReadToEnd();
}

Console.WriteLine(source);

edited Sep 01 '21 at 00:42

John Smith

7,243
6
49
61

answered Mar 01 '09 at 05:08

Diego Jancic

7,280
7
52
80

Hakan Fıstık · Answer 3 · 2023-03-29T18:35:28.687

34

The newest, most recent, up-to-date answer
This post is really old (it was 7 years old when I answered it), so no one of the other answers used the new and recommended way, which is HttpClient class.

HttpClient is considered the new API and it should replace the old ones
(WebClient and WebRequest)

HttpClient client = new HttpClient();   // actually only one object should be created by Application
string page = await client.GetStringAsync("page URL here");

longer way

string url = "page url";
HttpClient client = new HttpClient();   // actually only one object should be created by Application
using (HttpResponseMessage response = await client.GetAsync(url))
{
   using (HttpContent content = response.Content)
   {
      string pageContent = await content.ReadAsStringAsync();
   }
}

edited Mar 29 '23 at 18:35

answered Jan 21 '17 at 10:45

Hakan Fıstık

16,800
14
110
131

5

Suggestion: await the async methods. – Maarten Sep 15 '18 at 16:12
@Maarten the following link shows how to use this with async/await https://stackoverflow.com/questions/33020657/how-to-replace-webclient-with-httpclient/33031778#33031778 – Hakan Fıstık Dec 06 '19 at 09:35
any advantage of using async calls here? – Gary Bao 鲍昱彤 Jan 23 '21 at 01:39
I think it is always recommended to use async whenever it is possible because this could take time, and you do not want to block the thread with the Wait() call – Hakan Fıstık Jan 23 '21 at 09:51
Thank you. Using `HttpClient` is much faster than `WebClient`. – Toni Apr 27 '21 at 14:59
1

@MartinSchneider thank you for the note, it was the overloads with `CancellationToken` that is supported only from .NET 5 and up. I updated the answer, thank you. – Hakan Fıstık Mar 29 '23 at 18:33

score 17 · Answer 4 · edited Sep 01 '21 at 00:44

17

You can get the HTML source with:

var html = new System.Net.WebClient().DownloadString(siteUrl)

edited Sep 01 '21 at 00:44

John Smith

7,243
6
49
61

answered Jan 15 '13 at 14:40

Xenon

815
11
26

Short and sweet! I found your suggestion after I read Joe Albahari's example. LINQPad > Help > What's New, and search for Cache. – Colin Jul 28 '13 at 01:42
7

var html = new System.Net.WebClient().DownloadString(siteUrl); // need to new up your client! – Banoona Aug 11 '14 at 10:50
9

Does that `Dispose` the `WebClient`? – J D Mar 02 '16 at 00:56

score 11 · Answer 5 · edited Mar 22 '19 at 18:00

11

@cms way is the more recent, suggested in MS website, but I had a hard problem to solve, with both method posted here, now I post the solution for all!

problem: if you use an url like this: www.somesite.it/?p=1500 in some case you get an internal server error (500), although in web browser this www.somesite.it/?p=1500 perfectly work.

solution: you have to move out parameters, working code is:

using System.Net;
//...
using (WebClient client = new WebClient ()) 
{
    client.QueryString.Add("p", "1500"); //add parameters
    string htmlCode = client.DownloadString("www.somesite.it");
    //...
}

here official documentation

edited Mar 22 '19 at 18:00

Community

1
1

answered Jan 20 '11 at 18:11

Xilmiki

1,453
15
22

Please be careful when using DownloadString because it breaks the encoding if the website is not using UTF-8. Use instead DownloadData method and handle the encoding part too. – Alexandru Dicu May 13 '21 at 06:40

How can I download HTML source in C#

5 Answers5

Linked

Related