2

I have an HTML table that I need to select using XPath. The table may or may not contain multiple classes, but I only want tables that include a specific class.

Here is a sample HTML snippet:

<html>
  <body>
    <table class="no-border">
      <tr>
        <th colspan="2">Blah Blah Blah</th>
      </tr>
      <tr>
        <td>Content</td>
        <td>
          <table class="info no-border">
            <tr>
              <!-- Inner table content -->
            </tr>
          </table>
        </td>
      </tr>
    </table>
  </body>
</html>

I need to use XPath to retrieve ONLY the table that includes the class info. I've tried using /html/body/table/tr/td/table[@class='info*'], but that doesn't work. The table I'm trying to retrieve may exist ANYWHERE in the HTML document - technically, not ANYWHERE, but there may be varying levels of hierarchy between the outer and inner table.

If anyone can point me in the right direction, I'd be grateful.

4 Answers4

6

The closest you can do is with the contains function:

//table[contains(@class,'info')]

But please be aware that this would capture a table with the class information, or anything else that has the info substring. As far as I know XPath can't distinguish whole-word matches. So you'd have to filter results to check for this possible condition.

Niet the Dark Absol
  • 320,036
  • 81
  • 464
  • 592
  • This gets me a lot further along than I was. If no one offers a more elegant solution, I'll take this as the accepted answer. Thanks a lot, mate! – Jason Satterfield Aug 24 '13 at 16:18
1

What you'd ideally need is a CSS selector like table.info. And some XPath engines and toolkits fo XML/HTML parsing do support these selectors, which are translated to XPath expressions internally, e.g. cssselect if you use Python and which is included in lxml, or Nokogiri for Ruby.

In the general case, to emulate a CSS selector like table.info with XPath, a common trick or pattern is to use contains() combined with concat() and space characters. In your case, it looks like this:

.//table[contains(concat(' ', normalize-space(@class), ' '), ' info')]
paul trmbrth
  • 20,518
  • 4
  • 53
  • 66
1

I know that you did not asked for this answer, but I think it will help you to make your queries more precise.

//table[ (contains(@class,"result-cont") or contains(@class,"resultCont")) and not(contains(@class,"hide")) ]

This will get classes that contain 'result-cont' or 'resultCont', and do not have the 'hide' class.

Harm
  • 787
  • 7
  • 11
0

XPath 1.0 is , indeed, fairly limited in its string processing. You can do modest amounts of processing with starts-with() substring() and similar functions. See this answer for creating something similar to a regex.

XSLT2.0 (which not all browsers and software support) has support for regex.

Community
  • 1
  • 1
peter.murray.rust
  • 37,407
  • 44
  • 153
  • 217