I'm new to web scraping, and I've been using Selenium for this particular project. In this example, I'm crawling through the listings on a website and they are structured as follows...
Listing 1:
<html>
<div class="div_class">
<i class="first_i_class" style="i_style"> ::before </i>
First Category:
<span class="span_class">5</span>
<br>
<i class="second_i_class" style="i_style"> ::before </i>
Second Category:
<span class="span_class">3</span>
<br>
</div>
</html>
As you can see, the values for the first and second categories are similar, so finding all elements and then using a regex won't work here. I need to be able to get the text (5 and 3, in this example) based on the preceding text, in this case "First Category: " or "Second Category: ". Some listings, however, might skip certain categories and look like this...
Listing 2:
<html>
<div class="div_class">
<i class="third_i_class" style="i_style"> ::before </i>
Third Category:
<span class="span_class">7</span>
<br>
</div>
</html>
Because the categories change between listings, I don't think I can use something like:
cat_2_value = browser.find_element_by_xpath("/html/div/span[2][@class='span_class']")
because the xpath will also change. Is there a way that I can find the text in a given span based on either
- The preceding text (like "First Category: ") or
- The preceding
<i>
class (like "first_i_class")?
Any help or clarifying questions are much appreciated!
tag? I think that would include both the category and the value? But I'm not sure if there is an easier way. – DRo Jun 29 '20 at 10:04