Python / Selenium / Parsel? - locate all text within a DIV, including varying number of spans

Question

I am working with Python and Selenium, using an xpath class selector I am currently able to locate a specific div that contains text I wish to record.

The problem is this div can be void of any information (which I currently handle) or contain between 1-3 spans worth of text that I cannot access. What I am trying to do is pull all text, including the text within the spans.

Example HTML:

<div class="desktop-product-list-item__PotencyInfo-sc-8wto4u-14 hdncuE">
    <span class="grey-caps-text-sc-91lz0n-0 huWIpn">TAC</span>
    &nbsp;28.3%&nbsp;&nbsp;|&nbsp;&nbsp;
    <span class="grey-caps-text-sc-91lz0n-0 huWIpn">THC:</span>
    &nbsp;26.2%
</div>

Current XPATH:

potencyList = response1.xpath('//div[contains(@class, "__PotencyInfo-sc-")]/text()').getall()

With my current xpath I only pull the numbers "26.2" and "28.3", as the "TAC:" text and the "THC:" text are within spans. My goal is to pull all text, in order and then manipulate further with regex as needed.

I believe I could target the spans directly, but am unsure how I would do so when I take into account their varied quantities.

Also should note I am using Chromedriver.

Any insight would be greatly appreciated.

Have you tried using [innerText](https://stackoverflow.com/questions/30204029/given-a-python-selenium-webelement-can-i-get-the-innertext) to get this information? — Nick ODell, Apr 14 '23 at 00:19
@NickODell I did try this at one point but it kept returning nothing, likely due to me using it incorrectly. — T0ne, Apr 14 '23 at 00:24
What is `response1.xpath().getall()`? That's not selenium code. — JeffC, Apr 14 '23 at 00:24
Remove the `/text()` from the end of the XPath... does that work? `/text()` is specifically targeting text nodes which would exclude the SPANs. — JeffC, Apr 14 '23 at 00:26
@JeffC from parsel import Selector, I believe its associated with this. When I remove the /text() it gives me the full html, I took a shot at making that approach work with regex but it got a bit out of my grasp trying to deal with the various ways the data could present itself. — T0ne, Apr 14 '23 at 00:29
My main problem was not understanding how my script worked, these comments, questions and answers assisted me with that. As did https://stackoverflow.com/questions/26564843/scrapy-get-the-entire-text-including-children — T0ne, Apr 14 '23 at 01:05

score 1 · Answer 1 · answered Apr 14 '23 at 00:34

1

Using Selenium, you can do

potency_list = driver.find_element(By.XPATH, '//div[contains(@class, "__PotencyInfo-sc-")]').text

which, given the HTML provided, potency_list would print

TAC: 28.3%  |  THC: 26.2%

answered Apr 14 '23 at 00:34

JeffC

22,180
5
32
55

Python / Selenium / Parsel? - locate all text within a DIV, including varying number of spans

1 Answers1