0

I am working with Python and Selenium, using an xpath class selector I am currently able to locate a specific div that contains text I wish to record.

The problem is this div can be void of any information (which I currently handle) or contain between 1-3 spans worth of text that I cannot access. What I am trying to do is pull all text, including the text within the spans.

Example HTML:

<div class="desktop-product-list-item__PotencyInfo-sc-8wto4u-14 hdncuE">
    <span class="grey-caps-text-sc-91lz0n-0 huWIpn">TAC</span>
    &nbsp;28.3%&nbsp;&nbsp;|&nbsp;&nbsp;
    <span class="grey-caps-text-sc-91lz0n-0 huWIpn">THC:</span>
    &nbsp;26.2%
</div>

Current XPATH:

potencyList = response1.xpath('//div[contains(@class, "__PotencyInfo-sc-")]/text()').getall()

With my current xpath I only pull the numbers "26.2" and "28.3", as the "TAC:" text and the "THC:" text are within spans. My goal is to pull all text, in order and then manipulate further with regex as needed.

I believe I could target the spans directly, but am unsure how I would do so when I take into account their varied quantities.

Also should note I am using Chromedriver.

Any insight would be greatly appreciated.

T0ne
  • 91
  • 7
  • Have you tried using [innerText](https://stackoverflow.com/questions/30204029/given-a-python-selenium-webelement-can-i-get-the-innertext) to get this information? – Nick ODell Apr 14 '23 at 00:19
  • @NickODell I did try this at one point but it kept returning nothing, likely due to me using it incorrectly. – T0ne Apr 14 '23 at 00:24
  • What is `response1.xpath().getall()`? That's not selenium code. – JeffC Apr 14 '23 at 00:24
  • Remove the `/text()` from the end of the XPath... does that work? `/text()` is specifically targeting text nodes which would exclude the SPANs. – JeffC Apr 14 '23 at 00:26
  • @JeffC from parsel import Selector, I believe its associated with this. When I remove the /text() it gives me the full html, I took a shot at making that approach work with regex but it got a bit out of my grasp trying to deal with the various ways the data could present itself. – T0ne Apr 14 '23 at 00:29
  • My main problem was not understanding how my script worked, these comments, questions and answers assisted me with that. As did https://stackoverflow.com/questions/26564843/scrapy-get-the-entire-text-including-children – T0ne Apr 14 '23 at 01:05

1 Answers1

1

Using Selenium, you can do

potency_list = driver.find_element(By.XPATH, '//div[contains(@class, "__PotencyInfo-sc-")]').text

which, given the HTML provided, potency_list would print

TAC: 28.3%  |  THC: 26.2%
JeffC
  • 22,180
  • 5
  • 32
  • 55