0

I want the logic to get all page URLs from a website if I provide a website URL, means if I will provide a website URL then I should get all the pages with URLs in a collection. How can I implement this using C#.

Jonesopolis
  • 25,034
  • 12
  • 68
  • 112

2 Answers2

2

While this is not a trivial task, you best start with the Html Agility Pack.

It allows you to search for HTML tags, even if the markup is invalid. It is by far superior than parsing your responses manually.

As Save noted, the following answer provides a great example:

HtmlWeb hw = new HtmlWeb();
HtmlDocument doc = hw.Load(/* url */);
foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href]"))
{

}

Source: https://stackoverflow.com/a/2248422/548020

Community
  • 1
  • 1
CodeZombie
  • 5,367
  • 3
  • 30
  • 37
1

You can use the WebClient or WebRequest

WebRequest request = WebRequest.Create("http://www.yahoo.com");
WebResponse response = request.GetResponse();
Stream data = response.GetResponseStream();
string html = String.Empty;
using (StreamReader sr = new StreamReader(data))
{
    html = sr.ReadToEnd();
}
Sajeetharan
  • 216,225
  • 63
  • 350
  • 396