I wrote a script to scrape the titles of a YouTube playlist page
Everything works fine, according to print statements, until I try to write the titles into a text file, at which point I get "UnicodeEncodeError: 'charmap' codec can't encode characters in position..."
I've tried adding "encoding='utf8'" when I open the file, and while that fixes the error, all the Chinese characters are replaced by random, gibberish characters
I also tried encoding the output string with 'replace', then decoding it, but that also just replaces all the special characters with question marks
Here is my code:
from bs4 import BeautifulSoup as BS
import urllib.request
import re
playlist_url = input("gib nem: ")
with urllib.request.urlopen(playlist_url) as response:
playlist = response.read().decode('utf-8')
soup = BS(playlist, "lxml")
title_attrs = soup.find_all(attrs={"data-title":re.compile(r".*")})
titles = [tag["data-title"] for tag in title_attrs]
titles_str = '\n'.join(titles)#.encode('cp1252','replace').decode('cp1252')
print(titles_str)
with open("playListNames.txt", "a") as f:
f.write(titles_str)
And here is the sample playlist I've been using to test: https://www.youtube.com/playlist?list=PL3oW2tjiIxvSk0WKXaEiDY78KKbKghOOo