1
rating=[] 

for i in range(0,10):
    
    url = "https://www.yelp.com/biz/snow-show-flushing?osq=ice%20cream%20shop&start="+str(10*i)

    ourUrl = urllib.request.urlopen(url)
    
    
    soup = BeautifulSoup(ourUrl,'html.parser')
    for r in soup.find_all('span',{'class':"display--inline__373c0__1gaV4 border-color--default__373c0__1yxBb"})[1:]:  
        per_rating = r.div.get('aria-label')
        rating.append(per_rating)

Try to get ratings for each page. Should have only 58 ratings in total, but it includes the rating from the "you might also consider".

How to fix it.

PM 77-1
  • 12,933
  • 21
  • 68
  • 111
  • 1
    My guess is that that portion of the page is being populated with JavaScript, which cannot be accessed with urllib's urlopen function. Have you considered trying a package like Selenium to obtain the HTML, then parse it with BeautifulSoup (or equivalent)? This link may be helpful: https://stackoverflow.com/questions/8049520/web-scraping-javascript-page-with-python – jrd1 Nov 03 '21 at 23:04

1 Answers1

0

One possible solution would be to retrieve the total number of Reviews from yelp using BeautifulSoup. You can then trim your "rating"-list by the number of reviews.

# find the total number of reviews:
regex_count = re.compile('.*css-foyide.*')
Review_count = soup.find_all("p", {"class": regex_count})
Review_count = Review_count[0].text
Review_count = int(Review_count.split()[0]) # total number of reviews