2

I am parsing an html page and have a long CSS Selector (I can't figure out a shorter one, because the page is stupid). It should select all the tr in the table, but only selects the 2nd row... What am I missing?

The selector:

body > table:nth-child(1) > tbody:nth-child(1) > tr:nth-child(2) > td:nth-child(1) > table:nth-child(1) > tbody:nth-child(1) > tr:nth-child(3) > td:nth-child(1) > table:nth-child(1) > tbody:nth-child(1) > tr:nth-child(8) > td:nth-child(1) > table:nth-child(4) > tbody:nth-child(1) > tr:nth-child(2) > td:nth-child(1) > table:nth-child(1) > tbody:nth-child(1) tr:not(:first-child)

The page has multiple tables inside each other, but the first 90% does not even matter, after selecting the table I want to work with, I follow up with a "[space]tr:not(...)", so It should select all the descending rows, shouldn't it?

Example html page (cannot link it, you need to login to access it): http://pastebin.com/gprXTvzz

After the selector successfully selects the table I want (in the selector ...> tbody:nth-child(1) tr:not(:first-child) ), the age looks like this:

<tbody>
   <tr valign="bottom">
      <td class="blackmedium" width="80"><b>Part Number</b></td>
      <td class="blackmedium" width="100"><b>Manufacturer</b></td>
      <td class="blackmedium" width="40"><b>Abbr.</b></td>
      <td class="blackmedium" width="50"><b>WIX Part Number</b></td>
      <td class="blackmedium" width="50"><b>Lead Time</b></td>
   </tr>
   <tr>
      <td class="blackmedium" width="80">A0002701098</td>
      <td class="blackmedium" width="100">MERCEDES-BENZ</td>
      <td class="blackmedium" width="40">MBZ</td>
      <td class="blackmedium" width="50"> <a href="http://www.wixindustrialfilters.com/cross.aspx?Part=W03AT780" target="_blank">W03AT780</a>
      </td>
      <td class="blackmedium" width="50">
         STOCK
      </td>
   </tr>
   <tr bgcolor="#e0e0e0">
      <td class="blackmedium" width="80">A0002701598 Discontinued</td>
      <td class="blackmedium" width="100">MERCEDES-BENZ</td>
      <td class="blackmedium" width="40">MBZ</td>
      <td class="blackmedium" width="50"> <a href="javascript:var w=window.open('PartDetail.asp?Part=58892','PartDetail','left=200,top=200,width=530,height=500,toolbar=no,location=no,directories=no,status=no,menubar=no,resizable=yes,scrollbars=yes');w.focus();">58892</a>
      </td>
      <td class="blackmedium" width="50">
      </td>
   </tr>
   <tr>
      <td class="blackmedium" width="80">A0002772395</td>
      <td class="blackmedium" width="100">MERCEDES-BENZ</td>
      <td class="blackmedium" width="40">MBZ</td>
      <td class="blackmedium" width="50"> <a href="javascript:var w=window.open('PartDetail.asp?Part=51249','PartDetail','left=200,top=200,width=530,height=500,toolbar=no,location=no,directories=no,status=no,menubar=no,resizable=yes,scrollbars=yes');w.focus();">51249</a>
      </td>
      <td class="blackmedium" width="50">
      </td>
   </tr>
   <tr bgcolor="#e0e0e0">
      <td class="blackmedium" width="80">A0002772895</td>
      <td class="blackmedium" width="100">MERCEDES-BENZ</td>
      <td class="blackmedium" width="40">MBZ</td>
      <td class="blackmedium" width="50"> <a href="javascript:var w=window.open('PartDetail.asp?Part=57701','PartDetail','left=200,top=200,width=530,height=500,toolbar=no,location=no,directories=no,status=no,menubar=no,resizable=yes,scrollbars=yes');w.focus();">57701</a>
      </td>
      <td class="blackmedium" width="50">
      </td>
   </tr>
</tbody>
BoltClock
  • 700,868
  • 160
  • 1,392
  • 1,356
szab.kel
  • 2,356
  • 5
  • 40
  • 74

1 Answers1

1

body > table:nth-child(1) > tbody:nth-child(1) > tr:nth-child(2) > td:nth-child(1) > table:nth-child(1) > tbody:nth-child(1) > tr:nth-child(3) > td:nth-child(1) > table:nth-child(1) > tbody:nth-child(1) > tr:nth-child(8) > td:nth-child(1) > table:nth-child(4) > tbody:nth-child(1) > tr:nth-child(2) > td:nth-child(1) > table:nth-child(1) > tbody:nth-child(1) tr:not(:first-child)

Not exactly answering your question, but in case the markup is not parsing-friendly and I need to find a deeply nested in the terrible markup table element, I prefer to find it by a presence of a specific header in it. In this case, you may locate the table having the Part Number header. Example XPath:

//table[tr[1]/td/b = "Part Number"]

Then, on this table you can use the "not first child" CSS selector:

tr:not(:first-child)

Or, you may also use the adjacent selector (find tr elements after a tr element, which would logically exclude the first row):

tr + tr

Hope this would simplify things.

Community
  • 1
  • 1
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • I couldn't use xpath, but I solved it by getting all the tables first, then knowing which index I need, selected all the tr elements in the next statement. Yours should work too. (using jSoup) – szab.kel Feb 12 '16 at 13:33