Piping latin-1 encoded output of a program to a Python 3 script

Question

I want to process the output of a running program line-by-line (think tail -f) with a Python 3 script (on Linux).

The programs output, which is getting piped to the script, is encoded in latin-1, so, in Python 2, I used the codecs module to decode the input of sys.stdin properly:

#!/usr/bin/env python
import sys, codecs

sin = codecs.getreader('latin-1')(sys.stdin)
for line in sin:
    print '%s "%s"' % (type (line), line.encode('ascii','xmlcharrefreplace').strip())

This worked:

<type 'unicode'> "Hi! &#246;&#228;&#223;"
...

However, in Python 3, sys.stdin.encoding is UTF-8, and if I just read naively from stdin:

#!/usr/bin/env python3
import sys

for line in sys.stdin:
    print ('type:{0} line:{1}'.format(type (line), line))

I get this error:

UnicodeDecodeError: 'utf8' codec can't decode byte 0xf6 in position 4: invalid start byte

How can I read non UTF-8 text data piped to stdin in Python 3?

unutbu · Accepted Answer · 2011-03-15T12:10:46.963

4

import sys
import io

with io.open(sys.stdin.fileno(),'r',encoding='latin-1') as sin:
    for line in sin:
        print ('type:{0} line:{1}'.format(type (line), line))

yields

type:<class 'str'> line:Hi! öäß

edited Mar 15 '11 at 12:10

answered Mar 15 '11 at 00:59

unutbu

842,883
184
1,785
1,677

Works like a charm, and I can deal with the text directly, perfect! Thanks! – phoibos Mar 15 '11 at 02:18

score 2 · Answer 2 · edited May 23 '17 at 12:25

Take a look at this link in the documentation: sys.stdin. The relevant part is:

The standard streams are in text mode by default. To write or read binary data to these, use the underlying binary buffer. For example, to write bytes to stdout, use sys.stdout.buffer.write(b'abc'). Using io.TextIOBase.detach() streams can be made binary by default. This function sets stdin and stdout to binary:

def make_streams_binary():  
    sys.stdin = sys.stdin.detach()  
    sys.stdout = sys.stdout.detach()

After doing this you can encode the binary input into whatever encoding you want.

Also see this post: How to set sys.stdout encoding in Python 3?
The suggestion from that post was to use:

sys.stdin = codecs.getreader("utf-8")(sys.stdin.detach())

Piping latin-1 encoded output of a program to a Python 3 script

2 Answers2