3

I'm currently trying to extract some text from a website with xPath and Rapidminer. I want to extract the "270€" from the following code:

<dd class="grid-item three-fifths"> 
<span class="is1-operator">+</span> 
270 € 
</dd>

I tried the following which didn't work.

//h:dd[@class='grid-item three-fifths']//text()

Thanks for your help :)

Marius
  • 31
  • 1

2 Answers2

0

Your Xpath returns 3 text nodes:

  1. ""
  2. "+"
  3. "270€"

Try below XPath to fetch only "270€"

//h:dd[@class='grid-item three-fifths']/text()[string-length() > 0]
Andersson
  • 51,635
  • 17
  • 77
  • 129
  • Hey Andersson, thanks for your response. I tried the code you suggested, however it still only returns question marks – Marius Sep 13 '17 at 21:15
  • Did you try this solution? Or you've found your own? – Andersson Sep 13 '17 at 21:21
  • Hey Andersson, I tried what you suggested and it sill returned a question mark. So far, I didn't manage to solve the problem – Marius Sep 16 '17 at 13:06
  • I don't see question mark in provided HTML sample. Can you share exact HTML? – Andersson Sep 16 '17 at 13:19
  • true, there is no question mark in the html, thats why I wonder why it doesn't work out. the exact code is:
    + 270 €
    from the following website: https://www.immobilienscout24.de/expose/99020787?referrer=RESULT_LIST_LISTING&navigationServiceUrl=%2FSuche%2Fcontroller%2FexposeNavigation%2Fnavigate.go%3FsearchUrl%3D%2FSuche%2FS-T%2FWohnung-Miete%26exposeId%3D99020787&navigationHasPrev=false&navigationHasNext=true&navigationBarType=RESULT_LIST&searchId=d3565480-7ffe-38fc-b60c-9aa3dc88914b#/
    – Marius Sep 16 '17 at 19:13
  • class name of `dd` is incomplete in sample provided initially. Try this one `//dd[@class='is24qa-nebenkosten grid-item three-fifths']/text()[string-length() > 0]` – Andersson Sep 16 '17 at 19:27
  • yep I corrected that already but it doesn't work. I shortened it in the initial post just to keep it simple. Rapidminer still returns question marks – Marius Sep 16 '17 at 21:31
  • I guess this is Rapidminer issue as XPath is correct and Firepath returns desired output with this XPath – Andersson Sep 18 '17 at 19:01
0

As mentioned in previous post string-length filter can be used but [string-length() > 0] still brings 3 nodes. Both 'enter' and '+' text contents have a character.

[string-length() > 1] should work.

If you are sure about item position (in this case it is 3rd position)

//dd[@class='grid-item three-fifths']//text()[3]

If you are sure it is always last item:

//dd[@class='grid-item three-fifths']/text()[last()]

You can get text node after span in dd:

//dd[@class='grid-item three-fifths']//span/following-sibling::text()

Look for euro sign:

//dd/text()[contains(.,'€')]
Oktay
  • 423
  • 1
  • 6
  • 19