Although the script that I have written works, not all sites have their titles returned(that is what i'm going after, to get the website's title and print it back). Sites like google work, but others such as this very site, StackOverflow, generate an error.
Here is my code:
import urllib2
from bs4 import BeautifulSoup
soup = BeautifulSoup(urllib2.urlopen("http://lxml.de"))
print soup.title.string
If you could do these things for me that would be great :)
- If any improvements could be made to the code(and handle variables)
- How to solve the issue that it doesnt return (And handle any errors in genral)
- The code actaully returns a USERWARNING(when it actually works) saying that I should add a special "html.parser" after the script but it didnt work after i put that in.
BTW, ERROR GIVEN (exactly as it spat it out):
Traceback (most recent call last):
File "C:\Users\NAME\Desktop\NETWORK\personal work\PROGRAMMING\Python\bibli
ography PYTHON\TEMP.py", line 5, in <module>
soup = BeautifulSoup(urllib2.urlopen("http://stackoverflow.com/questions/364
96222/beautiful-soup-4-not-working-consistent"))
File "C:\Program Files (x86)\PYTHON 27\lib\urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
File "C:\Program Files (x86)\PYTHON 27\lib\urllib2.py", line 437, in open
response = meth(req, response)
File "C:\Program Files (x86)\PYTHON 27\lib\urllib2.py", line 550, in http_resp
onse
'http', request, response, code, msg, hdrs)
File "C:\Program Files (x86)\PYTHON 27\lib\urllib2.py", line 475, in error
return self._call_chain(*args)
File "C:\Program Files (x86)\PYTHON 27\lib\urllib2.py", line 409, in _call_cha
in
result = func(*args)
File "C:\Program Files (x86)\PYTHON 27\lib\urllib2.py", line 558, in http_erro
r_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 403: Forbidden
Press any key to continue . . .