-1

I want to get first-level elements via parsing HTML file with HTML Agility Pack ,for example result will be like this:

<html>
  <body>

     <div class="header">....</div>
     <div class="main">.....</div>
     <div class="right">...</div>
     <div class="left">....</div>
     <div class="footer">...</div>

   </body>
</html>

That each is contains other tag... I want to extract all text that exist in the website,but separately . for example right side separate,left side separate , footer and so...

can anyone help me?

thanks...

Homa Shafiei
  • 41
  • 12
  • but what have you tried..! – Anirudha Aug 19 '13 at 16:44
  • You have some specific html to parse, or you expect any html page will have this structure? Also what do you mean by extracting text, can you give a sample? – Sergey Berezovskiy Aug 19 '13 at 16:56
  • @ lazyberezovsky :yes,any html page.it's maens:text without tag – Homa Shafiei Aug 19 '13 at 17:15
  • As the asking persons mentions in a comment to my answer, this is not really a question that applies to one specific website but to whatever website you specify to the url, thus, this is not possible doing with one solution only. – Daniel B Aug 19 '13 at 19:28

1 Answers1

0

Use HtmlAgilityPack to load the webpage from the given URL, then parse it by selecting the correct corresponding tags.

HtmlWeb page = new HtmlWeb();
HtmlDocument doc = new HtmlDocument();
docc = page.Load("http://www.google.com");

If you want to select a specific div with the class name 'header', you do so by using the DocumentNode property of your document object.

string mainText = doc.DocumentNode.SelectSingleNode("//div[@class=\"main\"]").InnerText;

Chances are though that you have several tags in your HTML that are members of the 'main' class, thus you have to select them all then iterate over the collection, or be more precise when you select your single node.

To get a collection representation of all tags i.e. in class 'main', you use the DocumentNode.SelectNodes property instead.


I suggest you take a look at this question at SO where some of the basics and links to tutorials are available.

How to use HTML Agility pack

Community
  • 1
  • 1
Daniel B
  • 8,770
  • 5
  • 43
  • 76
  • yes,I know this but I want that execute for any website While the pattern is different for each website and this is my problem!!!!:( – Homa Shafiei Aug 19 '13 at 19:22
  • Then I suggest that you come up with your own algorithm or general method that magically will do this for you, because none have so far done that! If you know that all the class names etc. are the same on all websites, simply loop through them, otherwise your question is not an answerable one. – Daniel B Aug 19 '13 at 19:26