Python scripts piped in bash - consumer gets no data when producer is slow?

Question

I've got two Python scripts. One makes a database query, parses the output, and writes a formatted form line-by-line (to a file or to stdout as determined by command-line arguments, processed/opened by argparse).

The second, after about a minute of parsing static files, reads line-by-line (from a file or stdin, uses argparse, etc.), processes that, and writes its own output (file or stdout, determined by argparse).

Both work fine when unpiped ($ ./1.py argsfile midfile; ./2.py argsfile midfile outfile).

On smaller queries from the first script, piping the two together also works fine, and is actually a good bit faster ($ ./1.py - | ./2.py - outfile).

However, when the database query of the first script is large, doing the two separately still works but piping them does not - my best guess is that the second script is finishing its preprocessing, checking stdin, seeing it empty because the first script hasn't written to it yet, and proceeding on with nothing. Parse nothing, return nothing, write nothing to file.

The previous question/answer I've found on this seems to indicate that this shouldn't be possible, that for line in infile where the infile is standard input should block until closed. I've also tried

while True:
    line = infile.readline()
    if line == '':
        break
    else:
        pass # actual processing

but that doesn't work either. I can't drop the conditional and the break entirely because then it would block forever, and this is not meant for reading an endless stream, just some piped input that's taking a while to start arriving.

Keeping it in separate scripts is a requirement due to business constraints (minimize the number of difficulties in changing away from the systems involved in either of the two steps, if the time ever comes).

Output to a pipe is buffered; `1.py` isn't writing enough to fill up the buffer, so `2.py` has to wait until either `1.py` completes and closes its end of the pipe or it writes enough to the buffer for the OS to flush it to `2.py`. — chepner, Oct 04 '16 at 21:02
Ignoring SIGPIPE strikes me as a Bad Thing here -- if you're having a FD close early, you should probably figure out *why* that's happening. If you could [build a reproducer](http://stackoverflow.com/help/mcve), that would be helpful -- right now, this is a "I have this thing that shouldn't ever happen" post, but without sufficient code or details on how to make that thing *happen*; it's hard to debug without something concrete. — Charles Duffy, Oct 04 '16 at 21:07
@CharlesDuffy The only code that actually touches the file/pipe boils down to `for element in parsed: outfile.write(json.dumps(element) + '\n')` to write, and either `for line in infile` or the `readlines` thing from the question to read. Can't test if that's sufficient to replicate until tomorrow. Meanwhile, attaching debuggers to a piped script seems to involve waiting for all input and redirecting it so the debugger can see the keyboard... Which is effectively what I did by writing to an intermediate file, which didn't have the problem. — Vivian, Oct 05 '16 at 03:55
Eh? Attaching a debugger requires no such thing. Run `yourscript` in your pipeline as you normally would (maybe with some delays inserted to give you a chance to attach), and use the gdb `attach` command to attach to it from a completely different console window. — Charles Duffy, Oct 05 '16 at 03:59
...now, granted, for a Python process it's a little more interesting, but only a little -- there are plenty of off-the-shelf recipes to allow a UNIX socket to be used to connect to your debugger. See (as my very first Google result) https://pypi.python.org/pypi/rpcpdb, or https://pypi.python.org/pypi/manhole (which actually looks like the more mature of the two), or the WingIDE debugger, or https://pypi.python.org/pypi/remote-pdb -- this is a wheel that's reinvented on a darned near continual basis. — Charles Duffy, Oct 05 '16 at 04:01
@CharlesDuffy Removed the mention of the `signal` ignoring - I removed that line from the code and it changed nothing. — Vivian, Oct 05 '16 at 15:24
Good to hear. Where are you on having `1.py` and `2.py` implementations you can include in the question that repro the problem? — Charles Duffy, Oct 05 '16 at 15:25
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/125008/discussion-between-david-heyman-and-charles-duffy). — Vivian, Oct 05 '16 at 15:30

Python scripts piped in bash - consumer gets no data when producer is slow?

0 Answers0