I want to process the output of a running program line-by-line (think tail -f
) with a Python 3 script (on Linux).
The programs output, which is getting piped to the script, is encoded in latin-1, so, in Python 2, I used the codecs
module to decode the input of sys.stdin
properly:
#!/usr/bin/env python
import sys, codecs
sin = codecs.getreader('latin-1')(sys.stdin)
for line in sin:
print '%s "%s"' % (type (line), line.encode('ascii','xmlcharrefreplace').strip())
This worked:
<type 'unicode'> "Hi! öäß"
...
However, in Python 3, sys.stdin.encoding
is UTF-8
, and if I just read naively from stdin:
#!/usr/bin/env python3
import sys
for line in sys.stdin:
print ('type:{0} line:{1}'.format(type (line), line))
I get this error:
UnicodeDecodeError: 'utf8' codec can't decode byte 0xf6 in position 4: invalid start byte
How can I read non UTF-8 text data piped to stdin in Python 3?