BeautifulSoup HTTPResponse has no attribute encode

Question

I'm trying to get beautifulsoup working with a URL, like the following:

from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://proxies.org")
soup = BeautifulSoup(html.encode("utf-8"), "html.parser")
print(soup.find_all('a'))

However, I am getting a error:

 File "c:\Python3\ProxyList.py", line 3, in <module>
    html = urlopen("http://proxies.org").encode("utf-8")
AttributeError: 'HTTPResponse' object has no attribute 'encode'

Any idea why? Could it be to do with the urlopen function? Why is it needing the utf-8?

There clearly seems to be some differences with Python 3 and BeautifulSoup4, regarding the examples that are given (which seem to be out of date or wrong now)...

This ended up being the solution that was needed - http://stackoverflow.com/questions/32382686/unicodeencodeerror-charmap-codec-cant-encode-character-u2010-character-m — Ke., Jan 29 '17 at 20:41

score 0 · Answer 1 · answered Jan 29 '17 at 20:29

0

It's not working because urlopen returns a HTTPResponse object and you were treating that as straight HTML. You need to chain the .read() method on the response in order to get the HTML:

response = urlopen("http://proxies.org")
html = response.read()
soup = BeautifulSoup(html.decode("utf-8"), "html.parser")
print (soup.find_all('a'))

You probably also want to use html.decode("utf-8") rather than html.encode("utf-8").

answered Jan 29 '17 at 20:29

Josh Crozier

233,099
56
391
304

Hi Josh, this is still not working for me, Im using exactly the same code as you and its giving me a "character maps to " error – Ke. Jan 29 '17 at 20:36

score 0 · Answer 2 · answered Jan 29 '17 at 20:41

0

Check this one.

soup = BeautifulSoup(html.read().encode('utf-8'),"html.parser")

answered Jan 29 '17 at 20:41

orvi

3,142
1
23
36

score 0 · Answer 3 · answered Jan 30 '17 at 05:58

from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://proxies.org")
soup = BeautifulSoup(html, "html.parser")
print(soup.find_all('a'))

First, urlopen will return a file-like object
BeautifulSoup can accept file-like object and decode it automatically, you should not worry about it.

Document:

To parse a document, pass it into the BeautifulSoup constructor. You can pass in a string or an open filehandle:

from bs4 import BeautifulSoup

soup = BeautifulSoup(open("index.html"))

soup = BeautifulSoup("<html>data</html>")

First, the document is converted to Unicode, and HTML entities are converted to Unicode characters

BeautifulSoup HTTPResponse has no attribute encode

3 Answers3

Linked