0

In this document if the second column is blank it means use the previous row's value.

<doc>
<table>
<tr><td>ASU</td><td>CS</td><td>3</td></tr>
<tr><td>ASU</td><td>English</td><td>3</td></tr>
<tr><td>ASU</td><td></td><td>4</td></tr>
<tr><td>ASU</td><td>French</td><td>3</td></tr>
</table>
<table>
<tr><td>CMU</td><td>CS</td><td>4</td></tr>
<tr><td>CMU</td><td>English</td><td>3</td></tr>
<tr><td>CMU</td><td>French</td><td>3</td></tr>
<tr><td>CMU</td><td></td><td>4</td></tr>
</table>
<table>
<tr><td>SDSU</td><td>English</td><td>3</td></tr>
<tr><td>SDSU</td><td></td><td>4</td></tr>
<tr><td>SDSU</td><td></td><td>5</td></tr>
<tr><td>SDSU</td><td>French</td><td>4</td></tr>
</table>
</doc>

I want rows were the second columns are English so these would be the rows:

<tr><td>ASU</td><td>English</td><td>3</td></tr>
<tr><td>ASU</td><td></td><td>4</td></tr>
<tr><td>CMU</td><td>English</td><td>3</td></tr>
<tr><td>SDSU</td><td>English</td><td>3</td></tr>
<tr><td>SDSU</td><td></td><td>4</td></tr>
<tr><td>SDSU</td><td></td><td>5</td></tr>

What would the XPath be for this?

CW Holeman II
  • 4,661
  • 7
  • 41
  • 72

1 Answers1

2

(This is using XPath 1.0, there may be better solutions with more recent XPath versions).

First, you want trs, so that’s straightforward:

/doc/table/tr[...some predicate...]

The rows you want are either:

  1. Those with where the second tr just contains “English”

    tr[2] = 'English'
    
  2. Or those where the second tr is empty...

    tr[2] = ''
    

    and, looking at the previous sibling rows which don’t have an empty second tr...

    preceding-sibling::tr[td[2] != '']
    

    the first one ([1]) has a second tr that contains “English”

    /td[2] = 'English'
    

So combining all that, a query that gives you the desired rows is:

/doc/table/tr[td[2] = 'English'
  or (td[2] = ''
    and preceding-sibling::tr[td[2] != ''][1]/td[2] = 'English')]
matt
  • 78,533
  • 8
  • 163
  • 197