20

I am trying to get the text in the header on this page:

enter image description here

iShares FTSE MIB UCITS ETF EUR (Dist)

The tag looks like this:

<h1 class="product-title" title="iShares FTSE MIB UCITS ETF EUR (Dist)"> iShares FTSE MIB UCITS ETF EUR (Dist) </h1>

I am using this xPath:

xp_name = ".//*[@class[contains(normalize-space(.), 'product-title')]]"

Retrieving via .text in Selenium WebDriver for Python:

new_name = driver.find_element_by_xpath(xp_name).text

The driver finds the xpath, but when I print new_name, macOS Terminal only prints a blank string: ""

What could be the reason for this?

enter image description here


Note: I also tried some other xpath alternatives, getting the same result, for example with:

xp_name = ".//*[@id='fundHeader']//h1"
P A N
  • 5,642
  • 15
  • 52
  • 103

2 Answers2

49

The problem is that there are two h1 elements with totally the same outer HTML: the first is hidden, the second is not. You can check it with

print(len(driver.find_elements_by_xpath('//h1[@class="product-title "]')))

text property allow you to get text from only visible elements while textContent attribute also allow to get text of hidden one

Try to replace

new_name = driver.find_element_by_xpath(xp_name).text

with

new_name = driver.find_element_by_xpath(xp_name).get_attribute('textContent')

or simply handle the second (visible) header:

driver.find_elements_by_xpath('//h1[@class="product-title "]')[1].text
Andersson
  • 51,635
  • 17
  • 77
  • 129
  • 1
    Might or might not be relevant: For me the problem was that the element I was trying to fetch just wasn't loaded yet. So a 'time.sleep(1)' fixed it for me. The way the website was setup, it wouldn't throw an error tho. – Ahmad Moussa Jul 04 '20 at 19:48
  • 1
    Just want to say Thank You for this answer. Spent couple of hours trying to figure out why i get " " in scraping results, only after i added .get_attribute('textContent') i got my desired output – Antonych Oct 11 '20 at 15:51
1

As @ahmad-moussa mentioned, for me to the solution was:

import time

(...)

time.sleep(1)
# before 
<webelement>.text
Joel Mata
  • 456
  • 4
  • 8