I have a subtitle file consisting of utf-8 with Chinese characters. In fact, it's tiny so here is the file.
So far I've managed to read the file using
with open(path) as f:
text = f.read().decode('utf-8-sig').encode('utf-8')
print text[:100]
All I get is the usual mis-encoding mess:
1
00:00:20,160 --> 00:00:22,660
派拉蒙电影公å¸
2
00:00:32,160 --> 00:00:36,660
åŽçº³å…„弟ç
I've set chcp 65001
in cmd.exe and then ran the py script. What am I doing wrong?