I want the logic to get all page URLs from a website if I provide a website URL, means if I will provide a website URL then I should get all the pages with URLs in a collection. How can I implement this using C#.
Asked
Active
Viewed 768 times
0
-
2This is very broad and you've shown no effort towards solving your problem – Jonesopolis Apr 18 '14 at 12:16
-
1Looks like related or similar issue http://stackoverflow.com/questions/2248411 – Save Apr 18 '14 at 12:24
2 Answers
2
While this is not a trivial task, you best start with the Html Agility Pack.
It allows you to search for HTML tags, even if the markup is invalid. It is by far superior than parsing your responses manually.
As Save noted, the following answer provides a great example:
HtmlWeb hw = new HtmlWeb();
HtmlDocument doc = hw.Load(/* url */);
foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href]"))
{
}

Community
- 1
- 1

CodeZombie
- 5,367
- 3
- 30
- 37
1
You can use the WebClient or WebRequest
WebRequest request = WebRequest.Create("http://www.yahoo.com");
WebResponse response = request.GetResponse();
Stream data = response.GetResponseStream();
string html = String.Empty;
using (StreamReader sr = new StreamReader(data))
{
html = sr.ReadToEnd();
}

Sajeetharan
- 216,225
- 63
- 350
- 396