I'm trying to scrape a title name from a site using requests and beautiful soup library. I don't understand why the Japanese characters are not showing properly. My code is given below in which I'm scraping the title name of Japanese torrent.
from bs4 import BeautifulSoup as bs
import requests
def main():
current_headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, "
"like Gecko) Chrome/92.0.4515.159 Safari/537.36",
'Accept-Language': 'en-GB,jp,en;q=0.5',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,'
'image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Accept-Encoding': 'gzip, deflate, br'}
request = requests.session()
request.headers.update(current_headers)
request = request.get("https://nyaa.si/user/Mashin")
if request.status_code == 200:
print("Successful in getting site")
print(request.encoding)
soup = bs(request.content, features="html.parser")
for table_row in soup.find_all('tr')[1:]:
tds = table_row.find_all('td')
print(tds[1].find("a", class_='')['title'])
break
if __name__ == '__main__':
main()
Now the output I get is this:
Whereas the torrent name should be like this
[마신] [2021.08.25] TVアニメ「ひぐらしのなく頃に 卒」EDテーマ「Missing Promise」/鈴木このみ [MP3 320K]
Kindly help me understand the problem and what I am doing wrong. Thanks