1

I am trying to extract some text from this page

enter image description here

In particular I want to extract the text between the tags. I am using Selenium and the following code but even though the object is recognized, the text is an empty string. Below is the code I am using:

testo = driver.find_element_by_xpath('/html/body/span/pre[1]').text

What do think think it could be the issue?

PhDing
  • 163
  • 1
  • 9
  • 1
    You're telling it to start looking from the root, but that stuff is all inside an ` – Tim Roberts Apr 14 '22 at 20:19

2 Answers2

1

The text within <pre> tag is within an <iframe>

So to extract the desired text you have to:

  • Induce WebDriverWait for the desired frame to be available and switch to it.

  • Induce WebDriverWait for the desired element to be clickable.

  • You can use either of the following Locator Strategies:

    • Using CSS_SELECTOR:

      WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe#mainFrame")))
      print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "span.dettaglio_atto_testo"))).get_attribute("innerHTML"))
      
    • Using XPATH:

      WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.XPATH,"//iframe[@id='mainFrame']")))
      print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//span[@class='dettaglio_atto_testo']/pre"))).text)
      
  • Note : You have to add the following imports :

     from selenium.webdriver.support.ui import WebDriverWait
     from selenium.webdriver.common.by import By
     from selenium.webdriver.support import expected_conditions as EC
    
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
0

Firstly, you should switch to iframe. And then you can use .getText() method.

If it doesn't work you can try this: .getAttribute("innerText")

cruisepandey
  • 28,520
  • 6
  • 20
  • 38