0

Consider this HTML page (Whitespaces were intentional):

<html>
<body>
<div>
<div id="1"><span class="3">$200</span></div>
</div>
</div>
<div id="1"><span class="2">$250</span></div>
</div>
<div><span class="1">                        $400                     </span>
</div>
</body>
</html>

Now, let's say in Python Selenium I want to find all instances of any currency amounts on this page. I would like then to find various CSS attributes from these elements.

What I have tried (which seems to be the wrong way), is to use a regex expression and finditer function to search the html source for these instances. For each of these instances I am then using Selenium find_elements method.

Here is the issue: The finditter function will find 3 instances. But when I take the matched text from finditter (in this case those 3 matches would be: $200, $250 and $400) and put it into the find_elements function, it finds several for each instance, it appears to be showing me each of the parent tabs as a separate instance:

currencypattern = re.compile("(?<=$)\d{1,5}(?:\,\d{3})?(?:\.\d+)?")
for currencies in currencypattern.finditer(htmlsource):
     expr = ("$" + currencies.group())
     for i in driver.find_elements("xpath", '//*[contains(normalize-space(), "' + expr + '")]'):
           print(potential_prices.group(), " - ",i.tag_name)

the above code will print out the currency amount, and then the tag name, looped through until it has peached the top parent tag

If i just use find_element instead of find_elements, it seems to always return the top level tag when really I want the last child element and the CSS attributes from that.

Does anybody know how I can achieve this? I thought perhaps I could use the reg ex expression straight into the find_elements xpath but so far I've had no joy with this.

Thanks in advance.

  • So, just to understand: you are parsing HTML with regex? – Barry the Platipus Aug 19 '22 at 10:10
  • Yes currently, im sure there is a better way to do this though. – Milinekticker1 Aug 19 '22 at 10:23
  • I am not sure if I understood it correct, but wouldn't this simple thing work? `driver.find_elements(By.XPATH, "//*[contains(text(), '$')]"]`? I tried your HMTL code, and was able to find all the 3 elements with this locator strategy. – Anand Gautam Aug 19 '22 at 10:31
  • Thank you, you are right, but here is the crux of the issue, when using the above locator strategy, the CSS attributes I get back don't seem to match up with what I see in element inspector. Here is an example: https://www.kiddies-kingdom.com/travel-cots/39863-graco-contour-electra-travel-cot-suits-me.html Using £ instead of $, but if you look at the few prices on that page, the font-size for the main price (£119.99) is 26px, yet when using the locator strategy above, it finds the £119.99 text, but its font-size is reported as 15px – Milinekticker1 Aug 19 '22 at 10:54
  • To add to this, If i use my original method (reg ex on the HTML source, and then find_elements from each regex match), and itterate through all the font-sizes, I eventually find the correct one in there (in this case, 26px) – Milinekticker1 Aug 19 '22 at 11:00

0 Answers0