How do I use selenium to scrape text from a text node within a class through Python

Question

I have some HTML that I am using selenium to scrape, I want to scrape the text inside the small tags. I cannot use XPath as for other examples, the XPath changes. This is the HTML:

<h3 class="price">
    $28.04
<small>ex</small><br> <small>$30.84 <small>inc</small></small></h3>

I know you can use price = driver.find_elements_by_class_name("price") and using price[1].text to get the text but I end up getting a selenium webdriver element:

<selenium.webdriver.remote.webelement.WebElement (session="a95cede569123a83f5b043cd5e138c7c", element="a3cabc71-e3cf-4faa-8281-875f9e47d6a4")>

Is there a way to scrape the 30.84 text?

It's hard for me to post the URL because you need a login to access it. Is there any other info I can post? — Daichi, Sep 22 '20 at 03:13
Seems I was mistaken, my output seems to somewhat work, I get the text: $28.04 ex $30.84 inc Do you know if there is a way to get the second price? The output is a 20 character string so my solution would be to grab the last 10 characters of the string, but this would not work if the digits change. — Daichi, Sep 22 '20 at 03:27
Try this `driver.find_element(By.XPATH, "//h3[@class='price']//small(contains(., '$'))").text`. — Dilip Meghwal, Sep 22 '20 at 04:28

score 1 · Accepted Answer · answered Sep 22 '20 at 09:45

The text 30.84 is within a text node. So to print the text you have to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:

Using XPATH and childNodes:

print(driver.execute_script('return arguments[0].firstChild.textContent;', WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//h3[@class='price']//small[.//small[text()='inc']]")))).strip())

Using XPATH and splitlines():

print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//h3[@class='price']//small[.//small[text()='inc']]"))).get_attribute("innerHTML").splitlines()[1])

Note : You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Reference

You can find a detailed relevant discussion in:

How do I use selenium to scrape text from a text node within a class through Python

1 Answers1

Reference

Linked