How to properly redirect stdin to multiple subprocesses created sequentially?

Question

Context

I am experimenting with a script that is similar to vegeta's ramp-requests.py. In this script, I am running multiple subprocesses sequentially using subprocess.run(), and expect the standard input of the script to be redirected to those subprocesses during their entire lifetime (5s each).

#!/usr/bin/env python3

import json
import os
import subprocess
import sys
import time

rates = [1.0, 2.0, 3.0, 4.0]

# Run vegeta attack
for rate in rates:
    filename='results_%i.bin' % (1000*rate)
    if not os.path.exists(filename):
        cmd = 'vegeta attack -format=json -lazy --duration 5s -rate %i/1000s -output %s' % (1000*rate, filename)
        print(cmd, file=sys.stderr)
        subprocess.run(cmd, shell=True, encoding='utf-8')

I invoke the script as follows, by piping an infinite amount of inputs to it, each input separated by a new line. vegeta reads this input continuously until --duration has elapsed:

$ target-generator | ./ramp-requests.py

Problem

The first subprocess (rate=1.0) seems to receive stdin as I expect, and the command runs successfully, every time.

The second iteration (rate=2.0), however, fails silently, along with all subsequent iterations. If I inspect the corresponding report files (e.g. results_2000.bin) using the vegeta report command, I see fragments of errors such as parse error: syntax error near offset 0 of 'ource":["c...'.

My intuition is telling me that the second subprocess started consuming the input where the first one left it, in the middle of a line, but injecting a sys.stdin.readline() after subprocess.run() doesn't solve it. If that is the case, how can I cleanly solve this issue and ensure each subprocess starts reading from a "good" position?

If the subprocess reads stdin until it gets EOF, there's nothing left in the pipe for subsequent processes to read. — Barmar, Nov 12 '20 at 00:02
Also, many programs use buffered input. So the first subprocess may buffer input from the pipe that it doesn't use. It won't be available for the next subprocess to read. — Barmar, Nov 12 '20 at 00:08
In this case `target-generator` keeps generating inputs indefinitely, until it receives a SIGTERM, SIGINT or SIGPIPE, so the subprocess shouldn't get EOF. `vegeta`'s lazy mode is designed for receiving inputs from such generator. From what I can tell it works fine with 1 subprocess, even for a long period of time at high rates. — Antoine Cotten, Nov 12 '20 at 00:30
I don't think there's a good solution to this. It's buffering input, so it reads ahead in the pipe and the next invocation starts in the middle of a line. — Barmar, Nov 12 '20 at 00:39
I was hoping I could call `stdin.readline()` to "reposition" the standard input, bummer. Appreciate the comments though! — Antoine Cotten, Nov 12 '20 at 00:46
Python also uses buffered input. The problem is that the operating system doesn't provide any way to read a line at a time, except from terminals. — Barmar, Nov 12 '20 at 00:47
So unless the application reads a character at a time, which is extremely inefficient, it will read ahead. — Barmar, Nov 12 '20 at 00:48
Note that the use of `shell=True` coupled with a string substituted into the filename placeholder opens you up to shell injection attacks. That is to say: if your script (or something that calls it) is to told to write data to a file named `$(rm -rf ~)`, someone is liable to have a bad day. — Charles Duffy, Nov 12 '20 at 04:35

score 1 · Answer 1 · answered Nov 12 '20 at 04:28

1

Read a number of lines from stdin in your parent process, and pass that to your child process as -its- stdin. Repeat as needed. In this way, you do not need to worry about a child process making a mess of your stdin.

Feel free to borrow ideas from https://stromberg.dnsalias.org/~strombrg/mtee.html

HTH

answered Nov 12 '20 at 04:28

dstromberg

6,954
1
26
27

That will work indeed. My only concern, which pushed me to open this question, was that with rates that approach 100k requests/sec, the number of lines I have to buffer becomes really high. In this example, each "attack" lasts only 5s, but in practice they can last up to 60s (so I can assess that autoscaling does happen at certain, rates, etc.). – Antoine Cotten Nov 12 '20 at 08:16

score 0 · Accepted Answer · answered Nov 12 '20 at 10:59

As mentioned in @Barmar's comments, Python 3 opens stdin in buffered text mode, so both sys.stdin.read(1) and sys.stdin.readline() cause a read ahead and do not reposition the sys.stdin stream to the beginning of a new line.

There is, however, a way to disable buffering by opening sys.stdin in binary mode, as pointed out by Denilson Sá Maia in his answer to Setting smaller buffer size for sys.stdin?:

unbuffered_stdin = os.fdopen(sys.stdin.fileno(), 'rb', buffering=0)

By doing so, it is possible to read the truncated input until the end of the line from this unbuffered io object after each subprocess returns:

# Run vegeta attack
for rate in rates:
  # [...]

  cmd = 'vegeta attack [...]'
  subprocess.run(cmd, shell=True, encoding='utf-8')

  # Read potentially truncated input until the next '\n' byte
  # to reposition stdin to a location that is safe to consume.
  unbuffered_stdin.readline()

Printing the read line shows something similar to the output below:

b'a4b-b142-fabe0e96a6ca"],"Ce-Type":["perf.drill"],"Ce-Source":["load-test"]}}\n'

All subprocesses are now being executed successfully:

$ for r in results_*.bin; do vegeta report "$r"; done
[...]
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:5
Error Set:
[...]
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:7
Error Set:
[...]
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:8
Error Set:
[...]

How to properly redirect stdin to multiple subprocesses created sequentially?

Context

Problem

2 Answers2