I have written a script that scrapes a URL. It works fine on Linux OS. But i am getting http 503 error when running on Windows 7. The URL has some issue. I am using python 2.7.11 . Please help. Below is the script:
import sys # Used to add the BeautifulSoup folder the import path
import urllib2 # Used to read the html document
if __name__ == "__main__":
### Import Beautiful Soup
### Here, I have the BeautifulSoup folder in the level of this Python script
### So I need to tell Python where to look.
sys.path.append("./BeautifulSoup")
from bs4 import BeautifulSoup
### Create opener with Google-friendly user agent
opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
### Open page & generate soup
### the "start" variable will be used to iterate through 10 pages.
for start in range(0,1000):
url = "http://www.google.com/search?q=site:theknot.com/us/&start=" + str(start*10)
page = opener.open(url)
soup = BeautifulSoup(page)
### Parse and find
### Looks like google contains URLs in <cite> tags.
### So for each cite tag on each page (10), print its contents (url)
file = open("parseddata.txt", "wb")
for cite in soup.findAll('cite'):
print cite.text
file.write(cite.text+"\n")
# file.flush()
# file.close()
In case you run it in windows 7, the cmd throws http503 error stating the issue is with url. The URL works fine in Linux OS. In case URL is actually wrong please suggest the alternatives.