0

I am a newbie and am building a web scraper that will grab (and eventually export to csv) all the UK McDonalds addresses, postcodes and phone numbers. I am using an aggregator instead of the McDonalds website.

https://www.localstore.co.uk/stores/75639/mcdonalds-restaurant/

I have borrowed and repurposed some code:

from bs4 import BeautifulSoup
from urllib2 import urlopen

BASE_URL = "https://www.localstore.co.uk/stores/75639/mcdonalds-restaurant/"

def get_category_links(section_url):
    html = urlopen(section_url).read()
    soup = BeautifulSoup(html, "lxml")
    boccat = soup.find("tr")
    category_links = [BASE_URL + tr.a["href"] for tr in boccat.findAll("h2")]
    return category_links

def get_restaurant_details(category_url):
    html = urlopen(category_url).read()
    soup = BeautifulSoup(html, "lxml")
    streetAddress = soup.find("span", "streetAddress").string
    addressLocality = [h2.string for h2 in soup.findAll("span", "addressLocality")]
    addressRegion = [h2.string for h2 in soup.findAll("span", "addressRegion")]
    postalCode = [h2.string for h2 in soup.findAll("span", "postalCode")]
    phoneNumber = [h2.string for h2 in soup.findAll("td", "b")]
    return {"streetAddress": streetAddress,
            "addressLocality": addressLocality,
            "postalCode": postalCode,
            "addressRegion": addressRegion,
            "phoneNumber": phoneNumber}

I don't think I have grabbed the data - as when I run the following line:

print(postalCode)

or

print(addressLocality)

I get the following error

NameError: name 'postalCode' is not defined

any idea with what i'm doing wrong?

JasonC
  • 27
  • 5
  • 1
    well to start you need to call your functions ... oh the peril of coping and pasting... – Joran Beasley Mar 06 '17 at 21:45
  • `get_restaurant_details` returns a dict. If you want to access the data in that dict, you need to index it. It won't automatically create a bunch of new variables in the scope that called it; those `postalCode` and other variables are local. – user2357112 Mar 06 '17 at 21:45
  • Where is `postalCode` defined? Is your print statement within the scope that `postalCode` exists in? Names defined in functions don't just leak out so you can use them anywhere! Imagine trying to keep track of what is defined where. – Peter Wood Mar 06 '17 at 21:48
  • can someone give me an example of how I call my functions in a way that might deliver me some of the data I'm trying to grab? Searching around online - I can see this type of format might help `get_category_links(https://www.localstore.co.uk/stores/75639/mcdonalds-restaurant/)` - but it returns a bunch of errors. I worry that I'm not yet advanced enough to tackle this – JasonC Mar 07 '17 at 13:17

1 Answers1

3

As others have commented, you need to actually call your functions first off.

Do something like this

if __name__ == '__main__':
    res = "https://www.localstore.co.uk/store/329213/mcdonalds-restaurant/london/"
    print(get_restaurant_details(res)["postalCode"])

after your two functions. I just went on the site and got a URL that would work for your program, but I never actually tested it. The main problem you have right now is that you aren't actually doing anything. You need to call a function!