I am trying to scrape the NBA game predictions on FiveThirtyEight. I usually use urllib2 and BeautifulSoup to scrape data from the web. However, the html that is returning from this process is very strange. It is a string of characters such as "\x82\xdf\x97S\x99\xc7\x9d". I cannot encode it into regular text. Here is my code:
from urllib2 import urlopen
html = urlopen('http://projects.fivethirtyeight.com/2016-nba-picks/').read()
This method works on other websites and other pages on 538, but not this one.
Edit: I tried to decode the string using
html.decode('utf-8')
and the method located here, but I got the following error message:
UnicodeDecodeError: 'utf8' codec can't decode byte 0x8b in position 1: invalid start byte