7

I am trying to create a crawler to extract some attribute data from supplier websites that I can audit against our internal attribute database and am new to import.io. I watched a bunch of videos, but though my syntax seems to be right, my manual xpath override isn't returning attribute values. I have the following sample html code:

<table>
<tbody><tr class="oddRow">
<td class="label">&nbsp;Adhesive Type&lrm;</td><td>&nbsp;Epoxy&lrm;
</td>
</tr>
<tr>
<td class="label">&nbsp;Applications&lrm;</td><td>&nbsp;Hard Disk Drive Component Assembly&lrm;
</td>
</tr>
<tr class="oddRow">
<td class="label">&nbsp;Brand&lrm;</td><td>&nbsp;Scotch-Weld&lrm;
</td>
</tr>
<tr>
<td class="label">&nbsp;Capabilities&lrm;</td><td>&nbsp;Sustainability&lrm;
</td>
</tr>
<tr class="oddRow">
<td class="label">&nbsp;Color&lrm;</td><td>&nbsp;Clear Amber&lrm;
</td>

I am trying to write an xpath following sibling statement to grab "Color" through an import.io crawler. The xpath code when I select "Color" is:

//*[@id="attributeList"]/table/tbody/tr[5]/td[1]

I've tried to use:

//*[@id="attributeList"]/table/tbody/tr/td[.="Color"]/following-sibling::td

But it isn't grabbing the color attribute value from the table. I'm not sure if it has something to do with the odd and even row classes? When I look at the html, it seems to make logical sense; color is "Color" and the attribute value is in the following td bracket.

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
Elizabeth VO
  • 111
  • 1
  • 6

1 Answers1

7

The text in the selected td node contains more than just "Color". It is &nbsp;Color&lrm;. So instead you could select td nodes whose text contains the string "Color":

'//*[@id="attributeList"]/table/tbody/tr/td[contains(text(), "Color")]/following-sibling::td/text()'
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677