0

I want to extract all comments from the following link using HtmlAgilityPackage using C#.Page that needs to be scrapped is as follows

The code that I have written is as follows:

      var getHtmlWeb = new HtmlWeb();
        var document = getHtmlWeb.Load(txtinputurl.Text);
        var aTags = document.DocumentNode.SelectNodes("./div[@class='com_user_text']");
        int counter = 1;
        if (aTags != null)
        {
            foreach (var aTag in aTags)
            {
                lbloutput.Text += lbloutput.Text + ". " + aTag.InnerHtml + "\t" + "<br />";
                counter++;
            }
        }

The variable aTags is returning null value. I have also tried using the Xpath: //div[@class='newcomment_list']/ul/li/div[@class='headerwrap']/div[@class='com_user_text']

but still the same result.Please help me with the correct Xpath.

Thanks in advance

user3818862
  • 85
  • 1
  • 9
  • The said element(s) is in `iframe`, are you sure the content inside is loaded in your code? – Marko Gresak Jul 14 '15 at 17:50
  • @MarkoGrešak the content i want to scrape is not in a iframe its in a div tag – user3818862 Jul 14 '15 at 18:25
  • possible duplicate of [Scraping using Html Agility Package](http://stackoverflow.com/questions/31411942/scraping-using-html-agility-package) – LarsH Jul 14 '15 at 18:32
  • On the page, the user comments at the bottom are inside `iframe`, the `div` you are looking for is inside the iframe. If you don't believe me, inspect the source with your browser. But it doesn't matter if you believe me, just output whole html from within your code and check if comments are loaded. Also the problem with `iframe` is that it will create it's own scope, meaning that you can't just select an element within `iframe`, you have to specify it as a root element and select relative to it, not relative to whole document. – Marko Gresak Jul 14 '15 at 18:36
  • @MarkoGrešak I do believe you Sir...you were right..is there any other way to complete this task using HtmlAgilityPackage..Please advise for the same – user3818862 Jul 14 '15 at 19:07
  • [This question](http://stackoverflow.com/questions/9110331/get-i-frame-source-using-htmlagilitypack) will probably help with your problem. – Marko Gresak Jul 14 '15 at 19:28
  • Hi @MarkoGrešak just wanted to know..is it possible to scrape content from the webpage after the javascript is loaded completely using Html Agility Pack – user3818862 Jul 15 '15 at 17:13
  • The html agility pack is a parser, it won't execute scripts. Take a look at [this answer](http://stackoverflow.com/a/10886733/1276128) or the [awesomium project](http://www.awesomium.com/). – Marko Gresak Jul 16 '15 at 07:51

0 Answers0