1

I am trying to scrape the anime genre (text from "li.btnList a") from a Japanese website. My code is the below:

from bs4 import BeautifulSoup
import requests
import lxml

D_ANIME_URL = "https://animestore.docomo.ne.jp/animestore/gen_sel_pc"

response = requests.get(url=D_ANIME_URL)
text = response.text
soup = BeautifulSoup(text, "lxml")
genre = soup.select("li.btnList a")
for gen in genre:
    print(gen.text)

For the result I get a UnicodeEncodeError:

UnicodeEncodeError: 'charmap' codec can't encode characters in position 5-15: character maps to <undefined>

I have found similar posts that advises to use

gen.text.encode('utf-8')

but that only gives me the text in bytes which I obviously can't understand.

Below is the actual error I am getting:

Traceback (most recent call last):
  File "c:\Development\ani-gen\sandbox.py", line 12, in <module>
    print(gen.text)
  File "C:\Users\...\AppData\Local\Programs\Python\Python310\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 5-10: character maps to <undefined>

Are there points I am missing? I am new to Python and web scraping so any advise is appreciated!

totablue
  • 31
  • 7
  • Post the full traceback so we see where it came from. – tdelaney Jul 10 '22 at 16:19
  • Works for me. What is `sys.stdout.encoding`? – tdelaney Jul 10 '22 at 16:22
  • Can you put that into the question? Make it a code block like the script itself for readability. – tdelaney Jul 10 '22 at 16:29
  • I think your `gen.text` is a regular python unicode string (you could verify with `print(type(gen.text))`) but your terminal doesn't support it. See what's in `sys.stdout.encoding` to see what your terminal will accept. So then the question is what OS (including version)? – tdelaney Jul 10 '22 at 16:31
  • 1
    I am on Windows 11 OS BUILD: 22000.739. sys.stdout.encoding output is cp1252. – totablue Jul 10 '22 at 16:34

0 Answers0