1

I have some HTML that I am using selenium to scrape, I want to scrape the text inside the small tags. I cannot use XPath as for other examples, the XPath changes. This is the HTML:

<h3 class="price">
    $28.04
<small>ex</small><br> <small>$30.84 <small>inc</small></small></h3>

I know you can use price = driver.find_elements_by_class_name("price") and using price[1].text to get the text but I end up getting a selenium webdriver element:

<selenium.webdriver.remote.webelement.WebElement (session="a95cede569123a83f5b043cd5e138c7c", element="a3cabc71-e3cf-4faa-8281-875f9e47d6a4")>

Is there a way to scrape the 30.84 text?

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
Daichi
  • 309
  • 1
  • 2
  • 10
  • Can you post the url? – Maran Sowthri Sep 22 '20 at 03:13
  • It's hard for me to post the URL because you need a login to access it. Is there any other info I can post? – Daichi Sep 22 '20 at 03:13
  • gotcha, post the full error message – Maran Sowthri Sep 22 '20 at 03:14
  • Seems I was mistaken, my output seems to somewhat work, I get the text: $28.04 ex $30.84 inc Do you know if there is a way to get the second price? The output is a 20 character string so my solution would be to grab the last 10 characters of the string, but this would not work if the digits change. – Daichi Sep 22 '20 at 03:27
  • Try this `driver.find_element(By.XPATH, "//h3[@class='price']//small(contains(., '$'))").text`. – Dilip Meghwal Sep 22 '20 at 04:28

1 Answers1

1

The text 30.84 is within a text node. So to print the text you have to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:

  • Using XPATH and childNodes:

    print(driver.execute_script('return arguments[0].firstChild.textContent;', WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//h3[@class='price']//small[.//small[text()='inc']]")))).strip())
    
  • Using XPATH and splitlines():

    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//h3[@class='price']//small[.//small[text()='inc']]"))).get_attribute("innerHTML").splitlines()[1])
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

Reference

You can find a detailed relevant discussion in:

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352