I have a program that reads some URLs from a text file, gets the page source with requests.get, and then uses beautifulsoup4 to find some information.
f = open('inputfile.txt')
session = requests.Session()
session.headers.update({'User-Agent': 'Mozilla/5.0'})
for line in f:
x = 0
z = len(line)
r = session.get(line[x:z])
soup = bs4.BeautifulSoup(r.text, "html.parser")
This returns an HTTP 400 Bad Request - Invalid URL. However, when I do the same thing except type out the URL as a string, everything works (although I only get one URL).
f = open('inputfile.txt')
session = requests.Session()
session.headers.update({'User-Agent': 'Mozilla/5.0'})
for line in f:
r = session.get('http://www.ExactSameUrlAsEarlier.com')
soup = bs4.BeautifulSoup(r.text, "html.parser")
How would I fix/modify this to allow me to cycle through the multiple URLs I have in the file? Just for clarification, this is what the inputfile.txt looks like:
http://www.url1.com/something1
http://www.url2.com/something2
etc.
Thanks in advance.