4

I'm making a Python program that searches a webpage for a word. Although, when I try

website = urllib.request.urlopen(url)
content = website.read()
website.close()
test = html2text.html2text(content)
print(test)

I get this error :

test = html2text.html2text(content)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-    packages/html2text/__init__.py", line 840, in html2text
return h.handle(html)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-  packages/html2text/__init__.py", line 129, in handle
self.feed(data)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/html2text/__init__.py", line 125, in feed
data = data.replace("</' + 'script>", "</ignore>")
TypeError: a bytes-like object is required, not 'str'

I'm new to Python, so I'm not sure how to deal with this error.
Python 3.5, Mac.

alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
KamilDev
  • 718
  • 2
  • 7
  • 21

1 Answers1

3

decode() the content with the charset sent inside the Charset header (reference):

resource = urllib.request.urlopen(url)
content = resource.read()
charset = resource.headers.get_content_charset()
content = content.decode(charset)

Works for me (Python 3.5, Mac OS).

Community
  • 1
  • 1
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • I got this error : `charset = resource.headers.get_content_charset() AttributeError: module 'resource' has no attribute 'headers'` – KamilDev Dec 27 '15 at 13:20
  • @Kamdroid are you sure you are trying that on Python 3.5? – alecxe Dec 27 '15 at 22:18
  • It said `Python 2.7.10`, weird. I did download Python 3 though. Maybe it's because before downloading v3 I downloaded v2? Although I deleted the Python 2 folder before downloading v3. But I have the Python Folder that's labelled Python 3.5. – KamilDev Dec 28 '15 at 01:11
  • @Kamdroid have you tried running it as `python3.5`? – alecxe Dec 28 '15 at 01:16
  • Huh? I only have Python 3.5 apps, I think. What's the directory for Python 2? – KamilDev Dec 28 '15 at 04:47
  • I'm pretty sure it is running on Python 3.5 as it includes `3.5.1` at the end of the file name in window – KamilDev Dec 29 '15 at 10:55