1

Lets say I have html:

<body>
  <div class="items">
    <span class="label">label1</span>
    <div class="value">value1</div>
  </div>

  <div class="items">
    <span class="label">label2</span>
    <div class="value">
      <a class="link">value2</a>
    </div>
  </div>

  <div class="items">
    <span class="label">label3</span>
    <div class="value">
      <a class="link">value3</a>
    </div>
  </div>

  <div class="items">
    <span class="label">label4</span>
    <div class="value">value4</div>
  </div>
</body>

Im trying to get text from <a class="link"> if possible or from <div class=value>.

for result in response.xpath("//div[@class='items']"):
    label = result.xpath(".//span[@class='label']//text()").extract_first()
    # here Im trying use or operation to get 
    # a text if possible or div text
    value = result.xpath(".//a[@class='link']//text()"
                         "|.//div[@class='value']//text()").get()
    print(label, value)

Results:

label1 value1
label2 
label3 
label4 value4

This code assign only text from <div class='value'> although <a class='link'> exist.

What I need?
I would like to xpath code return a text if possible in otherwise it should take div text.

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
magnus250
  • 97
  • 3
  • 12
  • Think of this like a set-union OR, not a boolean OR; just like if you run `{1,2} | {3}` in Python, you get `{1,2,3}` as a result. – Charles Duffy Dec 28 '19 at 15:45
  • Okay, is there an option to get the result that interests me? – magnus250 Dec 28 '19 at 15:51
  • Is it *possible*? Yes. Is it *easier than doing the logic in Python*? No. Honestly, what I recommend in practice is just running two separate XPath queries. – Charles Duffy Dec 28 '19 at 15:51
  • For the general approach demonstrating that it's possible, see [Is there an if-then-else statement in XPath?](https://stackoverflow.com/questions/971067/is-there-an-if-then-else-statement-in-xpath). – Charles Duffy Dec 28 '19 at 15:52
  • I need to seperate this logic. I do something like this `# value = result.xpath('''if(.//a[@class='link']/text())) then .//a[@class='link']/text()) else .//div[@class='value']/text()''').get()`, but in result i have only exception from xpath interpreter. – magnus250 Dec 28 '19 at 16:16
  • @magnus250 There is a much simpler and shorter XPath expression that selects the wanted text nodes – Dimitre Novatchev Dec 29 '19 at 02:45

2 Answers2

1

Here is the xpath that you should use.

//div[@class='items'][2]//div[@class='value']/a|//div[@class='items'][2]//div[@class='value'][not(a)]

So replace this in your code.

value = result.xpath(".//div[@class='value']/a/text()|.//div[@class='value'][not(a)]/text()").get()
supputuri
  • 13,644
  • 2
  • 21
  • 39
0

Im trying to get text from <a class="link"> if possible or from <div> class=value>

Here is a simple / short XPath 1.0 expression that selects exactly all the wanted text nodes:

(//div[@class='value'] | //a[@class='link'])/text()

XSLT 1.0 - based verification:

This transformation:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

  <xsl:template match="/">
    <xsl:for-each select="(//div[@class='value'] | //a[@class='link'])/text()">
      <xsl:if test="not(position() = 1)">, </xsl:if>
      <xsl:copy-of select="."/>
    </xsl:for-each>
  </xsl:template>
</xsl:stylesheet>

evaluates the XPath expression and outputs each selected text-node using convenient delimiters.

The wanted result is produced:

value1, value2, value3, value4
Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431