I'm writing a python program for the purpose of studying HTML source code used in different countries. I'm testing in a UNIX Shell. The code I have so far works fine, except that I'm getting HTTP Error 403: Forbidden. Through testing it line by line, I know it has something to do with line 27: (url3response = urllib2.urlopen(url3)
url3Content =url3response.read()
Every other URL response works fine except this one. Any ideas???
Here is the text file I'm reading from (top5_US.txt):
http://www.caltech.edu
http://www.stanford.edu
http://www.harvard.edu
http://www.mit.edu
http://www.princeton.edu
And here is my code:
import urllib2
#Open desired text file (In this case, "top5_US.txt)
text_file = open('top5_US.txt', 'r')
#Read each line of the text file
firstLine = text_file.readline().strip()
secondLine = text_file.readline().strip()
thirdLine = text_file.readline().strip()
fourthLine = text_file.readline().strip()
fifthLine = text_file.readline().strip()
#Turn each line into a URL variable
url1 = firstLine
url2 = secondLine
url3 = thirdLine
url4 = fourthLine
url5 = fifthLine
#Read URL 1, get content , and store it in a variable.
url1response = urllib2.urlopen(url1)
url1Content =url1response.read()
#Read URL 2, get content , and store it in a variable.
url2response = urllib2.urlopen(url2)
url2Content =url2response.read()
#Read URL 3, get content , and store it in a variable.
url3response = urllib2.urlopen(url3)
url3Content =url3response.read()
#Read URL 4, get content , and store it in a variable.
url4response = urllib2.urlopen(url4)
url4Content =url4response.read()
#Read URL 5, get content , and store it in a variable.
url5response = urllib2.urlopen(url5)
url5Content =url5response.read()
text_file.close()