Japanese character are not showing while scraping through requests with UTF-8 encoding

Question

I'm trying to scrape a title name from a site using requests and beautiful soup library. I don't understand why the Japanese characters are not showing properly. My code is given below in which I'm scraping the title name of Japanese torrent.

from bs4 import BeautifulSoup as bs
import requests


def main():
    current_headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, "
                                     "like Gecko) Chrome/92.0.4515.159 Safari/537.36",
                       'Accept-Language': 'en-GB,jp,en;q=0.5',
                       'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,'
                                 'image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
                       'Accept-Encoding': 'gzip, deflate, br'}
    request = requests.session()
    request.headers.update(current_headers)

    request = request.get("https://nyaa.si/user/Mashin")
    if request.status_code == 200:
        print("Successful in getting site")
        print(request.encoding)
        soup = bs(request.content, features="html.parser")
        for table_row in soup.find_all('tr')[1:]:
            tds = table_row.find_all('td')
            print(tds[1].find("a", class_='')['title'])
            break


if __name__ == '__main__':
    main()

Now the output I get is this:

Whereas the torrent name should be like this

[마신] [2021.08.25] TVアニメ「ひぐらしのなく頃に卒」EDテーマ「Missing Promise」／鈴木このみ [MP3 320K]

Kindly help me understand the problem and what I am doing wrong. Thanks

What terminal do you use? Running your code my linux terminal correctly shows Japanese characters. — Andrej Kesely, Aug 24 '21 at 21:55
I just ran the command in py charm and it is showing me characters properly. — Moeed Azhar, Aug 24 '21 at 21:56
I don't have experience with Windows10, but maybe this will help: https://stackoverflow.com/questions/57131654/using-utf-8-encoding-chcp-65001-in-command-prompt-windows-powershell-window — Andrej Kesely, Aug 24 '21 at 21:56

Moeed Azhar · Accepted Answer · 2021-08-24T22:11:31.590

0

The problem seems to be with Windows CMD and its encoding issues. I ran the commad in pycharm terminal and it showed proper Japanese characters. Also, when I use these data into CSV and view the file using Microsoft Excel, Japanese characters are also not visible there. I am using pycharm to see the csv file as well

edited Aug 24 '21 at 22:11

answered Aug 24 '21 at 22:02

Moeed Azhar

61
6

Japanese character are not showing while scraping through requests with UTF-8 encoding

1 Answers1