10

I'm working with subprocess package to call some external console commands from a python script, and I need to pass file handlers to it to get stdout and stderr back separately. The code looks like this roughly:

import subprocess

stdout_file = file(os.path.join(local_path, 'stdout.txt'), 'w+')
stderr_file = file(os.path.join(local_path, 'stderr.txt'), 'w+')

subprocess.call(["somecommand", "someparam"], stdout=stdout_file, stderr=stderr_file)

This works fine and txt files with relevant output are getting created. Yet it would be nicer to handle these outputs in memory omitting files creation. So I used StringIO package to handle it this way:

import subprocess
import StringIO

stdout_file = StringIO.StringIO()
stderr_file = StringIO.StringIO()

subprocess.call(["somecommand", "someparam"], stdout=stdout_file, stderr=stderr_file)

But this doesn't work. Fails with:

  File "./test.py", line 17, in <module>
    subprocess.call(["somecommand", "someparam"], stdout=stdout_file, stderr=stderr_file)
  File "/usr/lib/python2.7/subprocess.py", line 493, in call
    return Popen(*popenargs, **kwargs).wait()
  File "/usr/lib/python2.7/subprocess.py", line 672, in __init__
    errread, errwrite) = self._get_handles(stdin, stdout, stderr)
  File "/usr/lib/python2.7/subprocess.py", line 1063, in _get_handles
    c2pwrite = stdout.fileno()
AttributeError: StringIO instance has no attribute 'fileno'

I see that it's missing some parts of the native file object and fails because of that.

So the question is more educational than practical - why these parts of file interface are missing from StringIO and are there any reasons why this cannot be implemented?

alexykot
  • 699
  • 1
  • 8
  • 21
  • This is sort of what `subprocess.check_output` is for. – Blender Oct 16 '13 at 16:45
  • hmm, thing is that `subprocess.check_output` throws out all output as one string, while I need to separate **stdout** from **stderr** – alexykot Oct 16 '13 at 16:58
  • hmm, found a workaround using Popen instead of subprocess. Explained here http://stackoverflow.com/questions/10103551/passing-data-to-subprocess-check-output – alexykot Oct 16 '13 at 17:01
  • selffix - Popen is a part of subprocess, so instead of subprocess.call() or subprocess.check_output() – alexykot Oct 16 '13 at 17:15
  • See http://stackoverflow.com/questions/5903501/attributeerror-stringio-instance-has-no-attribute-fileno. Yes, this is a bug in the standard library. – Charles Merriam Mar 05 '17 at 17:34
  • Original post was done in 2013. Solution for this problem is not on my priority list anymore, but thanks anyway. – alexykot Mar 07 '17 at 16:42

3 Answers3

10

As you said in your comment, Popen and Popen.communicate are the right solution here.

A bit of background: real file objects have file descriptors, which is the fileno attribute StringIO objects are missing. They're just ordinary integers: you may be familiar with file descriptors 0, 1 and 2, which are stdin, stdout and stderr, respectively. If a process opens more files, they're assigned 3, 4, 5, etc.. You can take a look at a process's current file descriptors with lsof -p.

So, why can't StringIO objects have file descriptors? In order to get one, it'd need to either open a file or open a pipe*. Opening a file wouldn't make sense, since not opening files is the whole point of using StringIO in the first place.

And opening a pipe also wouldn't make sense, even though they live in memory like StringIO objects do. They're for communication, not storage: seek, truncate, and len have no meaning at all for pipes, and read and write behave very differently than they do for files. When you read from a pipe, the returned data is deleted from the pipe's buffer, and if that (relatively small) buffer is full when you try to write, your process will hang until something reads from the pipe to free up buffer space.

So if you want to use a string as stdin, stdout or stderr for a subprocess, StringIO won't cut it but Popen.communicate is perfect. As stated above (and warned about in subprocess's docs), reading from and writing to pipes correctly is complicated. Popen handles that complexity for you.

* I guess I could theoretically imagine a third kind of file descriptor corresponding to a memory region shared between processes? Not really sure why that doesn't exist. But eh, I'm not a kernel developer, so I'm sure there's a reason.

Vanessa Phipps
  • 2,205
  • 1
  • 18
  • 22
  • 3
    [The documentation](https://docs.python.org/3.6/library/subprocess.html#frequently-used-arguments) is misleading. It says that the value of `stdout` can be "*an existing [file object](https://docs.python.org/3.6/glossary.html#term-file-object)*", not mentioning that it requires the file object to have a `fileno`. – Franklin Yu Jun 06 '17 at 21:39
  • The traceback is also unnecessarily cryptic, if subprocess adds specific requirements on the file object then a good error message should be detected early and express that rather than accepting whatever duck-typing happens to throw at you. – Erik Carstensen Sep 07 '22 at 14:23
0

If you want to redirect the stdout or stderr to a StringIO in real-time, you will have to do it concurrently. Here is an example using asyncio in Python 3.11:

import asyncio
import io
from subprocess import SubprocessError

# Maximum number of bytes to read at once from the 'asyncio.subprocess.PIPE'
_MAX_BUFFER_CHUNK_SIZE = 1024

# Buffers for stdout and stderr
stdout_buffer = io.StringIO()
stderr_buffer = io.StringIO()

async def run_cmd_async(command, check=False):
    process = await asyncio.subprocess.create_subprocess_exec(
        *command,
        stdout=asyncio.subprocess.PIPE,
        stderr=asyncio.subprocess.PIPE)

    async def write_stdout() -> None:
        assert process.stdout is not None
        while chunk := await process.stdout.read(_MAX_BUFFER_CHUNK_SIZE):
            stdout_buffer.write(chunk.decode())

    async def write_stderr() -> None:
        assert process.stderr is not None
        while chunk := await process.stderr.read(_MAX_BUFFER_CHUNK_SIZE):
            stderr_buffer.write(chunk.decode())

    async with asyncio.TaskGroup() as task_group:
        task_group.create_task(write_stdout())
        task_group.create_task(write_stderr())

        exit_code = await process.wait()
        if check and exit_code != 0:
            raise SubprocessError(
                f"Command '{command}' returned non-zero exit status {exit_code}."
            )
    return exit_code


# Run your command and print output
asyncio.run(run_cmd_async(["somecommand", "someparam"], check=True))
print(stdout_buffer.getvalue())
print(stderr_buffer.getvalue())

Then you could add a separate asynchronous task that gets the current value of the stdout and stderr buffers to do something with them in real-time.

-2

I think that you are expecting some other process to know how to read the memory as a stream from your main process. Perhaps if you can pipe your stream into the standard input and pipe the standard output into your stream you might be successful.

Fred Mitchell
  • 2,145
  • 2
  • 21
  • 29