5

right now I'm using HtmlAgilityPack.

but it very hard to select by Xpath.

In Java I know Jsoup. Is there any .net library that does the same?

parse Html and uses CSS style slectors to find elements.

Elad Benda
  • 35,076
  • 87
  • 265
  • 471

2 Answers2

6

Try Fizzler with HtmlAgilityPack.

Fizzler is:

A .NET library to select items from a node tree based on a CSS selector. The default implementation is based on HTMLAgilityPack and selects from HTML documents.

Example from project website:

// Load the document using HTMLAgilityPack as normal
var html = new HtmlDocument();
html.LoadHtml(@"
  <html>
      <head></head>
      <body>
        <div>
          <p class='content'>Fizzler</p>
          <p>CSS Selector Engine</p></div>
      </body>
  </html>");

// Fizzler for HtmlAgilityPack is implemented as the 
// QuerySelectorAll extension method on HtmlNode

var document = htmlDocument.DocumentNode;

// yields: [<p class="content">Fizzler</p>]
document.QuerySelectorAll(".content"); 

// yields: [<p class="content">Fizzler</p>,<p>CSS Selector Engine</p>]
document.QuerySelectorAll("p");

// yields empty sequence
document.QuerySelectorAll("body>p");

// yields [<p class="content">Fizzler</p>,<p>CSS Selector Engine</p>]
document.QuerySelectorAll("body p");

// yields [<p class="content">Fizzler</p>]
document.QuerySelectorAll("p:first-child");
Kamil
  • 13,363
  • 24
  • 88
  • 183
  • 1
    FIzzler isn't being mantained (no updates since July '09) and has only partial implementation of CSS3, compared to CsQuery's 100% CSS2 and CSS3. CsQuery also indexes documents and is much faster than Fizzler + HAP. – Jamie Treworgy Mar 04 '13 at 06:58
  • 1
    I would not recommend csQuery due to the number of bugs – tic Jul 30 '15 at 04:34
2

You could try this library, which looks very promising to me. I didn't try it myself, so maybe you wanna share your experience with us if you give that library a try.

Library: CsQuery Website: https://github.com/jamietre/CsQuery Sample:

// get all elements that are first children within 'body' (e.g. excluding 'head')
var childSpans = dom["body"].Find(":first-child");
MUG4N
  • 19,377
  • 11
  • 56
  • 83
  • I would NOT recommend this one cause it have some big problems regarding inheritance. For example you want the 2nd child of the father of the '.myclass [whatever]'...it will fail. – HellBaby May 22 '15 at 16:18
  • I would not recommend it also. It has pretty bad bugs that haven't ever been fixed. I have posted in the issue tracker for 2 bugs (It incorrectly matches same elements next to each other, and text() also includes comments) – tic Jul 30 '15 at 04:33