33

Is this the best way to get a webpage when scraping?

HttpWebRequest oReq = (HttpWebRequest)WebRequest.Create(url);
HttpWebResponse resp = (HttpWebResponse)oReq.GetResponse();

var doc = new HtmlAgilityPack.HtmlDocument();

doc.Load(resp.GetResponseStream());
var element = doc.GetElementbyId("//start-left");
var element2 = doc.DocumentNode.SelectSingleNode("//body");
string html = doc.DocumentNode.OuterHtml;

I've seen HtmlWeb().Load to get a webpage. Is that a better alternative to load and the scrape the webpage?


Ok i'll try that instead.

HtmlDocument doc = web.Load(url);

Now when i got my doc and didn't get so mutch properties. No one like SelectSingleNode. The only one I can use is GetElementById, and that works but I whant to get a class.

Do I need to do it like this?

var htmlBody = doc.DocumentNode.SelectSingleNode("//body");
htmlBody.SelectSingleNode("//paging");
Sergii Zhevzhyk
  • 4,074
  • 22
  • 28
thatsIT
  • 2,085
  • 6
  • 29
  • 43
  • You can stack/append node slects too. eg:var htmlBody = doc.DocumentNode.SelectSingleNode("//body").SelectSingleNode("//paging"); – Phill Healey Jul 23 '14 at 09:47

1 Answers1

74

Much easier to use HtmlWeb.

string Url = "http://something";
HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load(Url);
Jacob Proffitt
  • 12,664
  • 3
  • 41
  • 47
  • can u please go through this link ..having some issue regarding HtmlWeb().Load(Url) its not loading full content of webpage.. please help me //// http://stackoverflow.com/questions/18955793/why-htmlweb-loadurl-not-loading-the-page-with-full-content – BhavikKama Sep 24 '13 at 05:33