Web Scraping yelp review rating Python

Question

rating=[] 

for i in range(0,10):
    
    url = "https://www.yelp.com/biz/snow-show-flushing?osq=ice%20cream%20shop&start="+str(10*i)

    ourUrl = urllib.request.urlopen(url)
    
    
    soup = BeautifulSoup(ourUrl,'html.parser')
    for r in soup.find_all('span',{'class':"display--inline__373c0__1gaV4 border-color--default__373c0__1yxBb"})[1:]:  
        per_rating = r.div.get('aria-label')
        rating.append(per_rating)

Try to get ratings for each page. Should have only 58 ratings in total, but it includes the rating from the "you might also consider".

How to fix it.

My guess is that that portion of the page is being populated with JavaScript, which cannot be accessed with urllib's urlopen function. Have you considered trying a package like Selenium to obtain the HTML, then parse it with BeautifulSoup (or equivalent)? This link may be helpful: https://stackoverflow.com/questions/8049520/web-scraping-javascript-page-with-python — jrd1, Nov 03 '21 at 23:04

score 0 · Answer 1 · answered Oct 04 '22 at 10:41

One possible solution would be to retrieve the total number of Reviews from yelp using BeautifulSoup. You can then trim your "rating"-list by the number of reviews.

# find the total number of reviews:
regex_count = re.compile('.*css-foyide.*')
Review_count = soup.find_all("p", {"class": regex_count})
Review_count = Review_count[0].text
Review_count = int(Review_count.split()[0]) # total number of reviews

Web Scraping yelp review rating Python

1 Answers1