1

For the following HTML:

<table> 
 <tbody>
  <tr valign="TOP">
   <td align="LEFT"><b>Licensee Name:</b></td>
   <td align="LEFT">Some-last-name Some-first-name</td>
  </tr> 
  <tr valign="TOP">
   <td align="LEFT"><b>License Type:</b></td>
   <td align="LEFT">Optometrist (OPT)</td>
  </tr> 
.
.
.
 </tbody>
</table>

The following code produces an empty collection of Elements:

Elements rows = docOptometristDetail.select("body > table ~ tr");

But this code works:

tables = docOptometristDetail.select("body > table");
Elements rows = tables.select("tr");

I was expecting the tilde operator:

table ~ tr

to find the <table> element, skip the <tbody> element and build a collection of the <tr> elements.

Have I uncovered a weakness in Jsoup's selector syntax parser or am I attempting to violate some operator precedence rules?

I tried (body > table) ~ tr but that throws a SelectorParseException.

Is there a way to do this selection (i.e. getting an Elements collection of <tr> elements) with a single selector expression?

mbmast
  • 960
  • 11
  • 25

1 Answers1

1

In CSS, the tilde character, ~, is a general sibling combinator.

The selector table ~ tr will attempt to select the tr sibling elements following the the table element. Since table elements and tr elements can't be siblings, nothing will be selected.

In theory, the selector table ~ tr would select the tr elements below:

<table></table>
<tr></tr> <!-- These 'tr' elements are following siblings of the 'table' -->
<tr></tr> <!-- This is invalid HTML, though. -->

It sounds like you just need to select the descendants, therefore body > table tr will work.

Josh Crozier
  • 233,099
  • 56
  • 391
  • 304
  • This does work! Thanks. If the table is inside the body, you get to it via "body > table". So if the rows are inside the table, why don't you use "table > tr" (which I had previously tried, with no success)? – mbmast Nov 28 '15 at 18:10
  • @mbmast It's because `>` will select *direct* children elements, whereas a space will select all descendants. The `tr` elements are not direct children of the `table` element, but they are descendants. See: http://stackoverflow.com/questions/3225891/what-does-the-greater-than-sign-css-selector-mean – Josh Crozier Nov 28 '15 at 18:12