-1

New to scraping. I am trying to scrape an element within quotes and a
tag. If I type:

for rating in flex.find_all("div", class_="article-seller-rating m-t-1"):
     print(rating)

I get the following:

<div class="article-seller-rating m-t-1">
<div>
<span class="rating relative js-popover pointer" data-container="body" data-content="Buyers rated this dealer:&lt;br&gt;&lt;br&gt;4.9 out of 5 stars&lt;br&gt;Number of reviews: 18" data-placement="top" data-title="Reviews">
<More un-important stuff here>
</span></div>
</div>

I want to extract 2 elements:

4.9 (the review rating), and

18 (the number of reviews)

Any help is greatly appreciated!

Dan K
  • 1
  • 3

2 Answers2

0

I figured out a solution.

I also ran into an issue when sometimes one or both of those values were missing, where I wanted to indicate that since I was compiling a list to data.

I select all the digits in the "data-content" element, using [r'\d+(?:.\d+)?'] for the first value (with a decimal place) and [r'\d+(?:,\d+)?'] for the second value with a comma. This also manages to skip the "out of 5 stars". The "except IndexError:" deals with when it is missing.

for rating in flex.find_all("div", class_="article-seller-rating m-t-1"):

try:
    starrate = rating.select('span')[0].get("data-content")

    stars = re.findall(r'\d+(?:\.\d+)?',starrate)
    s=stars[0]
    master_list[c].append(s)

    ratings = re.findall(r'\d+(?:\,\d+)?',starrate)
    r=ratings[3]  
    master_list[c].append(r)

except IndexError:
Dharman
  • 30,962
  • 25
  • 85
  • 135
Dan K
  • 1
  • 3
0

Using re

import re

text = '''<div class="article-seller-rating m-t-1">
<div>
<span class="rating relative js-popover pointer" data-container="body" data-content="Buyers rated this dealer:&lt;br&gt;&lt;br&gt;4.9 out of 5 stars&lt;br&gt;Number of reviews: 18" data-placement="top" data-title="Reviews">
<More un-important stuff here>
</span></div>
</div>'''



numbers = re.findall(r"[-+]?\d*\.\d+|\d+",text)
print(numbers[1])
print(numbers[-1])

output

4.9
18
balderman
  • 22,927
  • 7
  • 34
  • 52
  • You forgot about [pony](https://stackoverflow.com/a/1732454/10824407). P.S. Your regex will fail on `+4`, `-5`, etc. – Olvin Roght Oct 15 '21 at 19:56