0

What I am trying to do here:

I am trying to crawl yelp and get reviews from a particular page. However, I just want to modify this script to give "Restaurant name" as input.

For example:

User Input: dennys-san-jose-5

URL: http://www.yelp.com/biz/**dennys-san-jose-5**

This is the actual script I am using right now:

from bs4 import BeautifulSoup
from urllib import urlopen
queries = 0
while queries <201:
    stringQ = str(queries)
    page = urlopen('http://www.yelp.com/biz/madison-square-park-new-york?start=' + stringQ)

    soup = BeautifulSoup(page)
    reviews = soup.findAll('p', attrs={'itemprop':'description'})
    authors = soup.findAll('span', attrs={'itemprop':'author'})

    flag = True
    indexOf = 1
    for review in reviews:
        dirtyEntry = str(review)
        while dirtyEntry.index('<') != -1:
            indexOf = dirtyEntry.index('<')
            endOf = dirtyEntry.index('>')
            if flag:
                dirtyEntry = dirtyEntry[endOf+1:]
                flag = False
            else:
                if(endOf+1 == len(dirtyEntry)):
                    cleanEntry = dirtyEntry[0:indexOf]
                    break
                else:
                    dirtyEntry = dirtyEntry[0:indexOf]+dirtyEntry[endOf+1:]
        f=open("reviews.txt", "a")
        f.write(cleanEntry)
        f.write("\n")
        f.close

    for author in authors:
        dirty = str(author)
        closing = dirty.index('>')
        dirty = dirty[closing+1:]
        opening = dirty.index('<')
        cleanEntry = dirty[0:opening]
        f=open("bla.txt", "a")
        f.write(cleanEntry)
        f.write("\n")
        f.close 
    queries = queries + 40

I am trying to read the restaurant name as parameter but it does not work somehow.

What i did:

while queries <201:
    stringQ = str(queries)
    page = urlopen('http://www.yelp.com/biz/' + stringQ)

But it does not work. I am giving dennys-san-jose-5 as input from commandline (python script.py dennys-san-jose-5)

Please suggest me the issue here and how I can fix.

Regards,

Dark Knight
  • 503
  • 2
  • 12
  • 25

1 Answers1

2

To read arguments from the commandline, you can use argparse.

import argparse

#Define command line arguments
parser = argparse.ArgumentParser(description='Get Yelp reviews.')
parser.add_argument("-p", "--page", dest="page", required=True, help="the page to parse")

#parse command line arguments
args = parser.parse_args()

Your page name will now be in args.page. In this example, you would run the script like this:

>python script.py  -p dennys-san-jose-5

or

>python script.py --page dennys-san-jose-5


Edit:

  • If your don't need any fancy stuff, and just want the raw command line input (like in a program that only you will be using, no need to validate input, etc):

    import sys
    print sys.argv
    
  • If you want to prompt the user for a page name as the program is running: Python: user input and commandline arguments

Community
  • 1
  • 1
leo
  • 8,106
  • 7
  • 48
  • 80
  • thanks for your comment. Is there anyway I could directly pass it without using -p or anything? Like, in java can't I pass `"dennys-san-jose-5"` and it gets appended to the end of the url `("http://www.yelp.com/biz/"+Query+)` like in java? Sorry I am a beginner in python – Dark Knight Apr 16 '14 at 08:42
  • To add to my comment, I have used sys. `import sys stringQ = sys.argv[1] page = urlopen('http://www.yelp.com/biz/' + stringQ)` and it started to work!! Is it the right way? – Dark Knight Apr 16 '14 at 08:46
  • 1
    If you don't need all the fancy functionality of argparse, you can get the raw command line arguments from `sys.argv`. – leo Apr 16 '14 at 08:46
  • Thank you so much for enlightening! I learn't something today!! :) I have one more question, all the reviews were getting appended everytime. But, I want to overwrite everytime(or create the same file again) everytime I search. `f=open("reviews.txt", "a")` however, i have modified it to `f=open("reviews.txt", "w")` but it doesn't work. Can you kindly suggest me why? Any alternatives? – Dark Knight Apr 16 '14 at 08:51
  • @RockyBalBoa Please [post a new question](http://stackoverflow.com/questions/ask) about that (if it hasn'r already [been answered](http://stackoverflow.com/questions/2424000/read-and-overwrite-a-file-in-python) somewere), so that others can find it too! – leo Apr 16 '14 at 08:55