5

I have the following three python scripts:

parent1.py

import subprocess, os, sys

relpath = os.path.dirname(sys.argv[0])
path = os.path.abspath(relpath)
child = subprocess.Popen([os.path.join(path, 'child.lisp')], stdout = subprocess.PIPE)
sys.stdin = child.stdout
inp = sys.stdin.read()
print(inp.decode())

parent2.py:

import sys
inp = sys.stdin
print(inp)

child.py:

print("This text was created in child.py")

If i call parent1.py with:

python3 parent1.py

it gives me like expected the following output:

This text was created with child.py

if i call parent2.py with:

python3 child.py | python3 parent2.py

i get the same output. But in the first example i get the output of child.py as bytes and in the second i get it directly as a string. Why is this? Is it just a difference between python and bash pipes or is there something i could do otherwise to avoid this?

Kritzefitz
  • 2,644
  • 1
  • 20
  • 35
  • [try this](http://stackoverflow.com/questions/3999114/linux-pipe-into-python-ncurses-script-stdin-and-termios?answertab=votes#tab-top) – scott May 08 '13 at 17:50

1 Answers1

3

When python opens stdin and stdout, it detects what encoding to use and uses text I/O to give you unicode strings.

But subprocess does not (and can not) detect the encoding of the subprocess you start, so it'll return bytes. You can use a io.TextIOWrapper() instance to wrap the child.stdout pipe to provide unicode data:

sys.stdin = io.TextIOWrapper(child.stdout, encoding='utf8')
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • 2
    Yep. I'd like to add that there is just one kind of pipes in the OS and that's used by bash and Python just the same. The interpretation of a stream can be different, and Python distinguishes the two cases; in one it interprets the input as bytes, in the other as string/unicode. – Alfe May 08 '13 at 17:51
  • Thanks that worked. If i now want to do something like 'cat /bin/bash | parent2.py' it raises an UnicodeDecodeError because sys.stdin.read() doesn't return bytes. Is there a way to go around this? – Kritzefitz May 08 '13 at 18:02
  • 1
    @Alfe: Well, it still _interprets_ the input as bytes in both cases, it just automatically wraps the stream in a `TextIOWrapper` for you in the latter case. You can get at the underlying byte stream, or manually attach your own wrapper, in either case. But still, a useful point. – abarnert May 08 '13 at 18:15
  • @IchUndNichtDu: `sys.stdin` is just a regular `TextIOWrapper` like anything else. (Try printing its `repr`.) So, you can get at the bytes inside it in all the usual ways. For example, you can get its `fileno()` and `os.read()` from that (but don't mix up reads from both the wrapper and the fileno!). – abarnert May 08 '13 at 18:17
  • @IchUndNichtDu: you could try to see if the object has an `encoding` attribute; if not, you need to wrap it; `if not hasattr(fileobj, 'encoding'): fileobj = io.TextIOWrapper(fileobj, encoding=...)`. – Martijn Pieters May 08 '13 at 18:21