15

I'm trying to pipe a io.BytesIO() bytetream to a separate program using subprocess.popen(), but I don't know how or if this is at all possible. Documentation and examples are all about text and newlines.

When I whip up something like this:

import io
from subprocess import *

stream = io.BytesIO()
someStreamCreatingProcess(stream)

command = ['somecommand', 'some', 'arguments']  
process = Popen(command, stdin=PIPE)
process.communicate(input=stream)

I get

Traceback (most recent call last):
  File "./test.py", line 9, in <module>
    procOut         = process.communicate(input=stream)
  File "/usr/lib/python2.7/subprocess.py", line 754, in communicate
    return self._communicate(input)
  File "/usr/lib/python2.7/subprocess.py", line 1322, in _communicate
    stdout, stderr = self._communicate_with_poll(input)
  File "/usr/lib/python2.7/subprocess.py", line 1384, in _communicate_with_poll
    chunk = input[input_offset : input_offset + _PIPE_BUF]
TypeError: '_io.BytesIO' object has no attribute '__getitem__'

I think popen() is only for text. Am I wrong?
Is there a different way to do this?

Redsandro
  • 11,060
  • 13
  • 76
  • 106

2 Answers2

10

As @falsetru said you can't stream BytesIO() object directly; you need to get a bytestring from it first. It implies that all content should be already written to stream before you call stream.getvalue() to pass to process.communicate().

If you want to stream instead of providing all input at once then you could drop BytesIO() object and write to the pipe directly:

from subprocess import Popen, PIPE

process = Popen(['command', 'arg1'], stdin=PIPE, bufsize=-1)
someStreamCreatingProcess(stream=process.stdin) # many `stream.write()` inside
process.stdin.close() # done (no more input)
process.wait()

someStreamCreatingProcess() should not return until it is done writing to the stream. If it returns immediately then it should call stream.close() at some point in the future (remove process.stdin.close() in your code):

from subprocess import Popen, PIPE

process = Popen(['command', 'arg1'], stdin=PIPE, bufsize=-1)
someStreamCreatingProcess(stream=process.stdin) # many `stream.write()` inside
process.wait() # stream.close() is called in `someStreamCreatingProcess`
Community
  • 1
  • 1
jfs
  • 399,953
  • 195
  • 994
  • 1,670
  • Uptvoted. What do you mean 'should not'? Should not in general, or will not after using this code? Normally `someStreamCreatingProcess()` returns immediately but keeps streaming until it explicitly receives the command to stop streaming (video in this case). – Redsandro Dec 02 '13 at 07:36
  • 1
    @Redsandro: I've shown how to handle case when `someStreamCreatingProcess()` returns immediately. – jfs Dec 02 '13 at 07:44
  • @Sebastian: Thanks, I will give this a go when I'm back. – Redsandro Dec 02 '13 at 08:04
  • @Sebastian: I'm back, and I accepted your answer. :) – Redsandro Nov 17 '15 at 12:42
7

According to subprocess.Popen.communicate:

The optional input argument should be a string to be sent to the child process, or None, if no data should be sent to the child.


To get (bytes) string value from BytesIO object, use getvalue:

process.communicate(input=stream.getvalue())
falsetru
  • 357,413
  • 63
  • 732
  • 636
  • _N.B._ In python a string is a group of _bytes_, which may, or may not, be interpreted as text by the code and the libraries depending on the context. It took me a little while to realise this myself. – Steve Barnes Dec 02 '13 at 05:05
  • 1
    Although this answer resolves the error message, the resulting pipe data is not accepted by e.g. `ffmpeg` _pipe:: Invalid data found when processing input._ When writing the stream directly to a file first, the file _is_ accepted by `ffmpeg`. I think @SteveBarnes is on to something that's also relevant in this case. – Redsandro Dec 02 '13 at 05:19
  • Try using `io.BufferedReader` as buffing is always recommended for binary data. – Steve Barnes Dec 02 '13 at 05:34
  • It may be a good idea to add `universal_newlines=False, bufsize=-1` to your POpen call to ensure that newline chars are not being mangled and the correct buffering is used. I also think that you may have to do something about closing file descriptors. – Steve Barnes Dec 02 '13 at 05:47
  • 2
    @Redsandro, Does `stream` contain all data when you call `subprocess.Popen`? – falsetru Dec 02 '13 at 06:05
  • 1
    @SteveBarnes: `bufsize` doesn't affect correctness -- whether ffmpeg works or not -- only its time performance in this case. `universal_newlines` *is* `False` by default; you don't need to specify it explicitly. – jfs Dec 02 '13 at 06:44
  • @falsetru, Nope, `stream` is being streamed into for an undetermined period of time by `someStreamCreatingProcess(stream)`. Isn't that how this works? – Redsandro Dec 02 '13 at 07:30
  • 1
    @Redsandro, Then, J.F. Sebastian's answer is what you want. – falsetru Dec 02 '13 at 07:34
  • I found some comments about some python versions defaulting `universal_newlines` to `True` and adding a unnecessary parameter that matches the default is not usually a problem. – Steve Barnes Dec 02 '13 at 15:22
  • @SteveBarnes, Which version? [2.4](http://docs.python.org/2.4/lib/node235.html), [2.5](http://docs.python.org/2.5/lib/node528.html), [2.6](http://docs.python.org/2.6/library/subprocess.html#subprocess.Popen), [2.7](http://docs.python.org/2.7/library/subprocess.html#subprocess.Popen), [3.0](http://docs.python.org/3.0/library/subprocess.html#subprocess.Popen), [3.1](http://docs.python.org/3.1/library/subprocess.html#subprocess.Popen), [3.2](http://docs.python.org/3.2/library/subprocess.html#subprocess.Popen), [3.3](http://docs.python.org/3.3/library/subprocess.html#subprocess.Popen) – falsetru Dec 02 '13 at 15:31