I am trying to make a crawler in python by following an udacity course. I have this method get_page()
which returns the content of the page.
def get_page(url):
'''
Open the given url and return the content of the page.
'''
data = urlopen(url)
html = data.read()
return html.decode('utf8')
the original method was just returning data.read()
, but that way I could not do operations like str.find()
. After a quick search I found out I need to decode the data. But now I get this error
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
I have found similar questions in SO but none of them were specifically for this. Please help.