I am trying to scrape the anime genre (text from "li.btnList a") from a Japanese website. My code is the below:
from bs4 import BeautifulSoup
import requests
import lxml
D_ANIME_URL = "https://animestore.docomo.ne.jp/animestore/gen_sel_pc"
response = requests.get(url=D_ANIME_URL)
text = response.text
soup = BeautifulSoup(text, "lxml")
genre = soup.select("li.btnList a")
for gen in genre:
print(gen.text)
For the result I get a UnicodeEncodeError:
UnicodeEncodeError: 'charmap' codec can't encode characters in position 5-15: character maps to <undefined>
I have found similar posts that advises to use
gen.text.encode('utf-8')
but that only gives me the text in bytes which I obviously can't understand.
Below is the actual error I am getting:
Traceback (most recent call last):
File "c:\Development\ani-gen\sandbox.py", line 12, in <module>
print(gen.text)
File "C:\Users\...\AppData\Local\Programs\Python\Python310\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 5-10: character maps to <undefined>
Are there points I am missing? I am new to Python and web scraping so any advise is appreciated!