I'm using Requests and BeautifulSoup with Python 3.4 to scrape information off a website that may or may not contain Japanese or other special characters.
def startThisPage(url):
r = requests.get(str(url))
r.encoding="utf8"
print(r.content.decode('utf8'))
soup = BeautifulSoup(r.content,'html.parser')
print(soup.h2.string)
The h2 contains this: "Fate/kaleid liner Prisma ☆ Ilya Zwei!" and I'm pretty sure the star is what is giving me troubles right now.
The error code that is being thrown at me:
UnicodeEncodeError: 'charmap' codec can't encode character '\u2606' in position 25: character maps to <undefined>
The page is encoded with utf8 and hence I tried to encode and decode with utf8 the byte string I'm receiving with r.content. I've also tried to decode first with unicode_escape thinking it was because of double \ but that wasn't the case. Any ideas?