0

I'm trying to scrape a title name from a site using requests and beautiful soup library. I don't understand why the Japanese characters are not showing properly. My code is given below in which I'm scraping the title name of Japanese torrent.

from bs4 import BeautifulSoup as bs
import requests


def main():
    current_headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, "
                                     "like Gecko) Chrome/92.0.4515.159 Safari/537.36",
                       'Accept-Language': 'en-GB,jp,en;q=0.5',
                       'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,'
                                 'image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
                       'Accept-Encoding': 'gzip, deflate, br'}
    request = requests.session()
    request.headers.update(current_headers)

    request = request.get("https://nyaa.si/user/Mashin")
    if request.status_code == 200:
        print("Successful in getting site")
        print(request.encoding)
        soup = bs(request.content, features="html.parser")
        for table_row in soup.find_all('tr')[1:]:
            tds = table_row.find_all('td')
            print(tds[1].find("a", class_='')['title'])
            break


if __name__ == '__main__':
    main()

Now the output I get is this:

Output Whereas the torrent name should be like this

[마신] [2021.08.25] TVアニメ「ひぐらしのなく頃に 卒」EDテーマ「Missing Promise」/鈴木このみ [MP3 320K]

Kindly help me understand the problem and what I am doing wrong. Thanks

Md. Fazlul Hoque
  • 15,806
  • 5
  • 12
  • 32

1 Answers1

0

The problem seems to be with Windows CMD and its encoding issues. I ran the commad in pycharm terminal and it showed proper Japanese characters. Also, when I use these data into CSV and view the file using Microsoft Excel, Japanese characters are also not visible there. I am using pycharm to see the csv file as well