1

I was trying web Scraping on the Zomato website. I just want comments for 1 restaurent.

import requests
from bs4 import BeautifulSoup
import re

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'}
r = requests.get('https://www.zomato.com/mumbai/joeys-pizza-malad-west/reviews',headers=headers)
soup = BeautifulSoup(r.text, 'html.parser')
#regex = re.compile('.*comment.*')
results = soup.find_all('p', {'class':'sc-1hez2tp-0 sc-eomEcv kOjze'})
reviews = [result.text for result in results]

But this code is taking too much time. My implementation is wrong? enter image description here

enter image description here I want to scrape the comments of all user from this URL https://www.zomato.com/mumbai/joeys-pizza-malad-west/reviews

pratik patil
  • 111
  • 2
  • 9

1 Answers1

1

Change your headers as follows:

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36'}

Also, the class names are loaded dynamically and keeps on changing. What you can do instead is to find the class sc-1hez2tp-0 fKvqMN which doesn't change, and then find the previous p which contains the desired output using the find_previous() method.

import requests
from bs4 import BeautifulSoup


headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36"
}
r = requests.get(
    "https://www.zomato.com/mumbai/joeys-pizza-malad-west/reviews", headers=headers
)
soup = BeautifulSoup(r.text, "html.parser")

for tag in soup.find_all("p", class_="sc-1hez2tp-0 fKvqMN"):
    print(tag.find_previous("p").text)

Output:

I had eaten veg chefs favourite ( mushroom ) a couple of years back , and I still remembered the delicious taste of mushrooms and that's why ordered it again today , but today all it had was boiled mushrooms , totally disappointed , ruined my taste buds , will never ever order again from here...I guess they did a shortcut due to high demand as delivery was also done very late after a reminder
JOEYS IS THE BEST PIZZA EVER. They give a lot of stuffings , the cheese is to die for . Joey’s Will always be my first choice . Paneer makhni and chicken makhani are the best ❤️
Their pizza may be good but they hve worst delivery patterns… my todays order i have been waiting for 1 1/2 hr for single potion pizza… worst and frustrating service… not for people who wants to enjoy pizza… now i feel its over rated… 

Famous Pizza corner Joey's pizza was so amazing and delicious also the toppings was so amazing . The thin crust cheese burst pizza was so yummy 
MendelG
  • 14,885
  • 4
  • 25
  • 52
  • thank you for your comment. BUt still not able to get comments of user. – pratik patil Jun 27 '21 at 03:28
  • How did you choose headers? please can I have your email id, please? – pratik patil Jun 27 '21 at 06:10
  • It's only showing page one comments how can we get comments from other pages? – pratik patil Jun 27 '21 at 07:30
  • @pratikpatil 1. Regarding the `headers` - Find it under your Developer tools. See [How can I view HTTP headers in Google Chrome?](https://stackoverflow.com/questions/4423061/view-http-headers-in-google-chrome). 2. to get the other pages, this is a completely separate question, please consider asking a new question on Stack Overflow – MendelG Jun 27 '21 at 21:41