2

I have the following html code:

<div class="jaHlC">
<div class="C" data-ft="true">
<div class="IuRIu"
<span>
<span class="biGQs _P fiohW uuBRH">
90 places sorted by traveler favorites</span>
</span>
<span class="nzZVd PJ">

I need to extract the text saying "90 places sorted by traveler favorites"

My python code is the following which does not work to extract the text:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

url = "https://www.tripadvisor.com/Attraction_Products-g28922-t21629-zfg21594-Alabama.html"

driver = webdriver.Firefox()
driver.get(url)

WebDriverWait(driver, 15).until(EC.element_to_be_clickable((By.ID, 'onetrust-accept-btn-handler'))).click()

# attempt 1 : does not work
#number = driver.find_element(By.XPATH, '//span[@class="biGQs _P fiohW uuBRH"]')

# attempt 2: does not work
#number = driver.find_element(By.XPATH, "/html/body/div[1]/main/div[1]/div/div[3]/div/div[2]/div[2]/div[2]/div/div/div[2]/div/div[2]/div/div/section[2]/div/div/div/span[1]/span")

# attempt 3: does not work either
number = driver.find_element(By.CSS_SELECTOR, "span.uuBRH")

Please suggest how I can extract the text. Thank you in advance.

R Sandy
  • 25
  • 3

2 Answers2

1

They use multiple classes for a single element and those classes can change dynamically, so searching for elements with a specific class name may fail if the class name changes.

if we want to find the element by using only part of the class name we could do something like the code below, it will ill wait up to 10 seconds for the element to become present. If it doesn't appear within 10 seconds, a TimeoutException will be raised

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

wait = WebDriverWait(driver, 10)
number = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "span[class*='fiohW uuBRH']")))
print(number.text)
Saxtheowl
  • 4,136
  • 5
  • 23
  • 32
1

The classname attribute values like biGQs, fiohW, uuBRH, etc, are dynamically generated and is bound to change sooner/later. They may change next time you access the application afresh or even while next application startup. So can't be used in locators.


Solution

To extract the text 90 places sorted by traveler favorites ideally you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following locator strategies:

  • Using CSS_SELECTOR and text attribute:

    driver.get("https://www.tripadvisor.com/Attraction_Products-g28922-t21629-zfg21594-Alabama.html")
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "section[data-automation=WebPresentation_WebSortDisclaimer] div > span > span"))).text)
    
  • Using XPATH and get_attribute("innerHTML"):

    driver.get("https://www.tripadvisor.com/Attraction_Products-g28922-t21629-zfg21594-Alabama.html")
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//section[@data-automation='WebPresentation_WebSortDisclaimer']//div/span/span"))).get_attribute("innerHTML"))
    
  • Console output:

    90 places sorted by traveler favorites
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python


References

Link to useful documentation:

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352