Unfortunately, Selenium was only built to work with Elements, not Text nodes.
If you try to use a function like get_element_by_xpath
to target the text nodes, Selenium will throw an InvalidSelectorException
.
One workaround is to grab the relevant HTML with Selenium and then use an HTML parsing library like Beautiful Soup that can handle text nodes more elegantly.
import bs4
from bs4 import BeautifulSoup
inner_html = driver.find_elements_by_css_selector('#a')[0].get_attribute("innerHTML")
inner_soup = BeautifulSoup(inner_html, 'html.parser')
outer_html = driver.find_elements_by_css_selector('#a')[0].get_attribute("outerHTML")
outer_soup = BeautifulSoup(outer_html, 'html.parser')
From there, there are several ways to search for the Text content. You'll have to experiment to see what works best for your use case.
Here's a simple one-liner that may be sufficient:
inner_soup.find(text=True)
If that doesn't work, then you can loop through the element's child nodes with .contents() and check their object type.
Beautiful Soup has four types of elements, and the one that you'll be interested in is the NavigableString type, which is produced by Text nodes. By contrast, Elements will have a type of Tag.
contents = inner_soup.contents
for bs4_object in contents:
if (type(bs4_object) == bs4.Tag):
print("This object is an Element.")
elif (type(bs4_object) == bs4.NavigableString):
print("This object is a Text node.")
Note that Beautiful Soup doesn't support XPath expressions. If you need those, then you can use some of the workarounds in this question.