This is driving my somewhat nutty at the moment. It is clear from my last days of research that unicode is a complex topic. But here is behavior that I do not know how to address.
If I read a file with non-ASCII characters from disk and wrtie it back to file everything works as planned. however, when I read the same file from sys.stdin, id does not work and the the non-ASCII characters are not encoded properly. The sample code is here:
# -*- coding: utf-8 -*-
import sys
with open("testinput.txt", "r") as ifile:
lines = ifile.read()
with open("testout1.txt", "w") as ofile:
for line in lines:
ofile.write(line)
with open("testout2.txt", "w") as ofile:
for line in sys.stdin:
ofile.write(line)
The input file testinput.txt
is this:
を
Sōten_Kōro
when I run the script from command line as cat testinput.txt | python test.py
I get the following output respectively:
testout1.txt
:
を
Sōten_Kōro
testout2.txt
:
???
S??ten_K??ro
Any ideas how to adress this would be of great help. Thanks. Paul.