Extract text from Text Node using XPath

Question

I am new to XPath and trying to capture the values "Time: " and "13:45" from the following HTML snippet. Any help or suggestion will be really useful. Thank you!

<div class="inner-box">
    <p class="inner-info-blk">
        <strong>Time: </strong>
        "13:45"
    </p>
</div>

I can access the label with in the ... container with the pattern below but cannot figure out how to get the time value with in the  container.

Label xpath:

//div[@class="inner-box"]/p[@class="inner-info-blk"]/strong

If you are using selenium, you can get the info with this driver.find_element(By.XPATH,'//*[@class="inner-info-blk"]]).text — Mohit Kumar, Jun 10 '23 at 14:42

undetected Selenium · Answer 1 · 2023-06-10T23:21:05.580

Given the HTML:

<div class="inner-box">
    <p class="inner-info-blk">
        <strong>Time: </strong>
        "13:45"
    </p>
</div>

The time value i.e. 13:45 is a within a Text Node_ and the lastChild of it's parent . So to extract the desired text you can use either of the following locator strategies:

Using xpath, execute_script() and textContent:

print(driver.execute_script('return arguments[0].lastChild.textContent;', driver.find_element(By.XPATH, "//div[@class="inner-box"]/p[@class="inner-info-blk"]")).strip())

Using xpath, get_attribute() and splitlines():

print(driver.find_element(By.CSS_SELECTOR, "div.inner-box > p.inner-info-blk").get_attribute("innerHTML").splitlines()[2])

Alternative

As an alternative you can also use Beautiful Soup as follows:

Code Block:

from bs4 import BeautifulSoup

html_text = '''
<div class="inner-box">
    <p class="inner-info-blk">
        <strong>Time: </strong>
        "13:45"
    </p>
</div>
'''

soup = BeautifulSoup(html_text, 'html.parser')

last_text = soup.find("p", {"class": "inner-info-blk"}).contents[2]
print(last_text.strip())

Console Output:

"13:45"

References

You can find a couple of relevant detailed discussions in:

Yubo · Answer 2 · 2023-06-10T23:36:40.063

You can use text() to get the text from an element.

from lxml import etree

html = '''
<div class="inner-box">
<p class="inner-info-blk">
    <strong>Time: </strong>
    "13:45"
</p>
'''

x = etree.HTML(html)
result = x.xpath('//div[@class="inner-box"]/p[@class="inner-info-blk"]/text()[2]') # get the text inside p
print(result[0].strip()) # since LXML return a list, you need to get the first one

And that would get the text from the  element.

UPDATE: As @shailesh has mentioned, the Selenium locator would not evaluate XPath expression that returns a text; nor, to the best of my knowledge, there exists such a method in Selenium that will evaluate arbitrary XPath expression. But just to offer an alternative, you may also use a bit of JS here:

from selenium import webdriver
from selenium.webdriver.common.by import By

driver = webdriver.Chrome()
driver.get(
    "file:///C:/Users/yubo/data/social/stackoverflow/6.10%20Selenium/example.html"
)
time = driver.find_element(
    By.XPATH,
    './/div[@class="inner-box"]/p[@class="inner-info-blk"]',
)
print(driver.execute_script("return arguments[0].lastChild.textContent", time).strip()) # Same as @undetected selenium; a coincidence where we happened to write at the same time.
driver.quit()

Thank you Yubo. Appreciate it, The code is itself works fine, But, I am getting this error "AttributeError: 'WebElement' object has no attribute 'xpath'" when using in the loop where I extrac the value from all the elements. Previously I was trying: time = entry.find_element(By.XPATH, './/div[@class="inner-box"]/p[@class="inner-info-blk"]').text — Anaras, Jun 10 '23 at 14:53
@Yubo Ah, `return arguments[0].lastChild.textContent` it's purely _**plagiarized**_ content from [my answer](https://stackoverflow.com/a/76448423/7429447) — undetected Selenium, Jun 10 '23 at 23:10

shailesh · Answer 3 · 2023-06-10T20:16:57.787

You can find out the solution using split method, because Locators do not allow to use text() method with xpath. Time: in your example is a static and unique value which can split to get actual time value what you expect. I would recommend to first deal with xpath, if not found the solution try to resolve by logic. May be this can help you.

from selenium import webdriver
from selenium.webdriver.common.by import By


driver = webdriver.Firefox()
driver.get('https://www.yourpage.html')
time = driver.find_element(By.XPATH,"//p")

print(time.text.split("Time:")[1])

driver.quit()

O/P: "13:45"

This can be also relevant

from selenium import webdriver
from selenium.webdriver.common.by import By

driver = webdriver.Firefox()
driver.get('file:///Users/shakava/Downloads/stackoverflow.html')
time = driver.find_element(By.XPATH,"//p")
arr = time.text.split(":")

START = 1
timeVal = ""

for index, item in enumerate(arr[START:], START):
    if index>1:
        timeVal+=":"

    timeVal+=item
    index+1

print(timeVal)
driver.quit()

O/P: "13:45"

Extract text from Text Node using XPath

3 Answers3

Alternative

References

Linked