What I am trying to do here:
I am trying to crawl yelp and get reviews from a particular page. However, I just want to modify this script to give "Restaurant name" as input.
For example:
User Input: dennys-san-jose-5
URL: http://www.yelp.com/biz/**dennys-san-jose-5**
This is the actual script I am using right now:
from bs4 import BeautifulSoup
from urllib import urlopen
queries = 0
while queries <201:
stringQ = str(queries)
page = urlopen('http://www.yelp.com/biz/madison-square-park-new-york?start=' + stringQ)
soup = BeautifulSoup(page)
reviews = soup.findAll('p', attrs={'itemprop':'description'})
authors = soup.findAll('span', attrs={'itemprop':'author'})
flag = True
indexOf = 1
for review in reviews:
dirtyEntry = str(review)
while dirtyEntry.index('<') != -1:
indexOf = dirtyEntry.index('<')
endOf = dirtyEntry.index('>')
if flag:
dirtyEntry = dirtyEntry[endOf+1:]
flag = False
else:
if(endOf+1 == len(dirtyEntry)):
cleanEntry = dirtyEntry[0:indexOf]
break
else:
dirtyEntry = dirtyEntry[0:indexOf]+dirtyEntry[endOf+1:]
f=open("reviews.txt", "a")
f.write(cleanEntry)
f.write("\n")
f.close
for author in authors:
dirty = str(author)
closing = dirty.index('>')
dirty = dirty[closing+1:]
opening = dirty.index('<')
cleanEntry = dirty[0:opening]
f=open("bla.txt", "a")
f.write(cleanEntry)
f.write("\n")
f.close
queries = queries + 40
I am trying to read the restaurant name as parameter but it does not work somehow.
What i did:
while queries <201:
stringQ = str(queries)
page = urlopen('http://www.yelp.com/biz/' + stringQ)
But it does not work. I am giving dennys-san-jose-5 as input from commandline (python script.py dennys-san-jose-5)
Please suggest me the issue here and how I can fix.
Regards,