-1

I have some text I need to extract using XPath selectors. The text can be in 3 different forms:

<td>
    TARGET_TEXT
</td>

<td>
    <p>
        TARGET_TEXT
    </p>
</td>

<td>
    <p>
        <strong>TARGET_TEXT</strong>
    </p>
</td>

Is there an XPath statement/selector I can use that will handle all 3 of these scenarios? Or is it possible to add OR statements in an XPath selector?

for tr in table_rows:
    # only handles case 1
    topic_name = tr.xpath('.//td[1]/text()').extract()[0]
sazr
  • 24,984
  • 66
  • 194
  • 362
  • 2
    Hey, Jake, how about [**accepting**](http://meta.stackoverflow.com/q/5234/234215) some of the fine answers you've gotten in the past. You've asked 18 questions since August and accepted 0. Something's wrong there. – kjhughes Nov 18 '16 at 03:45

3 Answers3

1

This XPath,

normalize-space(/td)

will return the same space-normalized string value of /td,

TARGET_TEXT

for all three of your cases.

For more information on string values in XPath, see Testing text() nodes vs string values in XPath.

Community
  • 1
  • 1
kjhughes
  • 106,133
  • 27
  • 181
  • 240
0
for tr in table_rows:

    all_three = tr.xpath('.//td//text()').extract()
宏杰李
  • 11,820
  • 2
  • 28
  • 35
-1

Looks like the following is adequate:

for tr in table_rows:
    topic_name = tr.xpath('.//td[1]//text()').extract()
    # topic_name can be ['\r\n', 'TARGET_TEXT', '\r\n']
    topic_name = ''.join(topic_name)
sazr
  • 24,984
  • 66
  • 194
  • 362