Folks, I'm tryning to extract data from web page using C#.. for the moment I used the Stream from the WebReponse and I parsed it as a big string. It's long and painfull. Someone know better way to extract data from webpage? I say WINHTTP but isn't for c#..
Asked
Active
Viewed 5,813 times
2 Answers
5
To download data from a web page it is easier to use WebClient:
string data;
using (var client = new WebClient())
{
data = client.DownloadString("http://www.google.com");
}
For parsing downloaded data, provided that it is HTML, you could use the excellent Html Agility Pack library.
And here's a complete example extracting all the links from a given page:
class Program
{
static void Main(string[] args)
{
using (var client = new WebClient())
{
string data = client.DownloadString("http://www.google.com");
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(data);
var nodes = doc.DocumentNode.SelectNodes("//a[@href]");
foreach(HtmlNode link in nodes)
{
HtmlAttribute att = link.Attributes["href"];
Console.WriteLine(att.Value);
}
}
}
}

Darin Dimitrov
- 1,023,142
- 271
- 3,287
- 2,928
-
will you please explain little more – Aitazaz Khan Oct 30 '13 at 20:57
0
If the webpage is valid XHTML, you can read it into an XPathDocument and xpath your way quickly and easily straight to the data you want. If it's not valid XHTML, I'm sure there are some HTML parsers out there you can use.
Found a similar question with an answer that should help. Looking for C# HTML parser