0
import time
import random
from bs4 import BeautifulSoup as bs
import urllib
import urllib.request as url

html = urllib.request.urlopen('https://www.yelp.com/biz/the-stillery-nashville?osq=Restaurants+Nashville+Tn').read().decode('utf-8')
soup = bs(html, 'html.parser')

relevant= soup.find_all('p', class_='comment__09f24__gu0rG css-qgunke')

for div in relevant:
        for html_class in div.find_all('span',class_="raw__09f24__T4Ezm"):
            text = html_class.find('span')
            review = html_class.getText()
            print(review)

This works. But this does not. I do not understand why the second is not working.

import time
import random
from bs4 import BeautifulSoup 
import urllib
import urllib.request as url
import html

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}

url='https://www.yelp.com/biz/the-hampton-social-nashville-nashville-2?osq=Restaurants+Nashville+Tn'
response=requests.get(url, headers=headers)
soup2 = BeautifulSoup(response.text, 'html.parser')

relevant2= soup2.find_all('p', class_='comment__09f24__gu0rG css-qgunke')
for div in relevant2:
        for html_class in div.find_all('span',class_="raw__09f24__T4Ezm"):
            #text = html_class.find('span')
            review2 = html_class.get_text()
            print(review2)
            

I am looking to get the reviews.

I tried the code listed above for scraping restaurants from yelp datasets

sha_phys
  • 1
  • 1
  • `for div in relevant2:` The second code sample does not have a variable named `relevant2`. Presumably you meant to have code like `relevant2 = soup2.find_all(...)`, but you forgot. – John Gordon Apr 18 '23 at 17:16
  • 1
    "does not work" -- show the error in your question. – Charles Duffy Apr 18 '23 at 17:31
  • There is not any error as the relevant2 is empty. It just does not have anything. – sha_phys Apr 18 '23 at 18:00
  • Perhaps the issue is with the request headers. `urllib` and `requests` are using different `User-Agents` values; try passing the same headers in both GET requests. – Übermensch Apr 26 '23 at 12:19

1 Answers1

0

From here https://stackoverflow.com/a/38114548

use requests.get(url, headers=headers, verify=False)

  • This verify argument did not helped. – sha_phys Apr 18 '23 at 18:00
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Apr 21 '23 at 04:41