0

I am trying to get web content by using HtmlAgilityPack but I am not getting entire contents.

Following is my code :

using HAP=HtmlAgilityPack;
using HtmlAgilityPack;

using (var client = new System.Net.WebClient())
    {
        var filename = System.IO.Path.GetTempFileName();
        client.DownloadFile("http://www.cnn.com/", filename);
        var doc = new HAP.HtmlDocument();
        doc.Load(filename);

        var root = doc.DocumentNode;
        var a_nodes = root.Descendants("a").ToList();

        foreach (var a_node in a_nodes)
        {
            Console.WriteLine();


            Console.WriteLine(a_node.InnerText.Trim());
        }
    }

    Console.ReadKey();

output:

https://i.stack.imgur.com/PJTV2.jpg

As you can see in screenshot that I am getting contents from tabs like Entertainment,Living, etc but nothing above that.

Any suggestions?

user3771772
  • 33
  • 1
  • 3
  • 7
  • have you looked at any of the `HtmlAgilityPack` documentation to see if there are examples on who to navigate / filter on a particular node..? try this posting and see if it helps http://stackoverflow.com/questions/19870116/using-htmlagilitypack-for-parsing-a-web-page-information-in-c-sharp – MethodMan Sep 15 '14 at 19:45
  • Well, I looked through some of the examples but most of them are for specific tag like XML file. I don;t want that. My target is to get all the information from any tags. – user3771772 Sep 15 '14 at 19:48
  • do a google search on the following `C# HtmlAgilityPack get all web content` there are lots of examples – MethodMan Sep 15 '14 at 19:50
  • 1
  • @ Patrice Gahide. What I am missing here in order to get all the contents? And even if I am using only then also I shoud be getting all the contents but in this case I am getting half of it only. – user3771772 Sep 15 '14 at 21:00
  • @user3771772 `doc.DocumentNode.OuterHtml` will give you the entire page content in HTML format. If this isn't what you what, then explain in what form you want the all content be? Or do you still get only a half content using `doc.DocumentNode.OuterHtml`? – har07 Sep 16 '14 at 00:38
  • @har07 Can you give any example. By using this I was getting all the contents in string but how to fetch data from that string? Thanks. – user3771772 Sep 16 '14 at 13:02
  • Example to get what data? Do you want content of each and all html node : `this is data that I want` ? – har07 Sep 16 '14 at 13:06
  • yes. For example ,

    ,,etc. Does make sense?
    – user3771772 Sep 16 '14 at 13:20
  • I also have following query but it always gives me empty result. xdoc is outer HTML. `var res = from item in xdoc.Descendants("div") where item.Attribute("class") != null && item.Element("a") != null select new { Link = item.Element("a").Attribute("href").Value, Image = item.Element("a").Element("img").Attribute("src").Value, Title = item.Elements("p").ElementAt(0).Element("a").Value, Desc = item.Elements("p").ElementAt(1).Value }; foreach (var node in res) { Console.WriteLine(node); Console.WriteLine("\n"); } Console.ReadKey();` – user3771772 Sep 16 '14 at 13:23

0 Answers0