33

There's some work in progress related to adding xpath support to jsoup https://github.com/jhy/jsoup/pull/80.

  • Is it working?
  • How can I use it?
Stephan
  • 41,764
  • 65
  • 238
  • 329
gguardin
  • 551
  • 1
  • 4
  • 9
  • There is a boatload of information on this topic out there: https://stackoverflow.com/questions/11816878/jsoup-css-selector-code-xpath-code-included https://stackoverflow.com/questions/16335820/convert-xpath-to-jsoup-query https://stackoverflow.com/questions/11791596/how-to-get-absolute-path-of-an-html-element https://groups.google.com/forum/?fromgroups#!topic/jsoup/lj4_-EJwH1Q – Display Name is missing May 28 '13 at 19:45

3 Answers3

13

JSoup doesn't support XPath yet, but you may try XSoup - "Jsoup with XPath".

Here's an example quoted from the projects Github site (link):

@Test
public void testSelect() {

    String html = "<html><div><a href='https://github.com'>github.com</a></div>" +
            "<table><tr><td>a</td><td>b</td></tr></table></html>";

    Document document = Jsoup.parse(html);

    String result = Xsoup.compile("//a/@href").evaluate(document).get();
    Assert.assertEquals("https://github.com", result);

    List<String> list = Xsoup.compile("//tr/td/text()").evaluate(document).list();
    Assert.assertEquals("a", list.get(0));
    Assert.assertEquals("b", list.get(1));
}

There you'll also find a list of features and expressions of XPath that are supported by XSoup.

Kim Moritz
  • 165
  • 1
  • 11
ollo
  • 24,797
  • 14
  • 106
  • 155
2

Not yet,but the project JsoupXpath has make it.For example,

String html = "<html><body><script>console.log('aaaaa')</script><div class='test'>some body</div><div class='xiao'>Two</div></body></html>";
JXDocument underTest = JXDocument.create(html);
String xpath = "//div[contains(@class,'xiao')]/text()";
JXNode node = underTest.selNOne(xpath);
Assert.assertEquals("Two",node.asString());

By the way,it supports the complete W3C XPATH 1.0 standard syntax.Such as

//ul[@class='subject-list']/li[./div/div/span[@class='pl']/num()>(1000+90*(2*50))][last()][1]/div/h2/allText()
//ul[@class='subject-list']/li[not(contains(self::li/div/div/span[@class='pl']//text(),'14582'))]/div/h2//text()
xiaohuo
  • 2,375
  • 1
  • 11
  • 11
0

HtmlUnit supports XPath. I've used this for certain projects and it works reasonably well.

Rob Evans
  • 2,822
  • 1
  • 9
  • 15