I've got two Python scripts. One makes a database query, parses the output, and writes a formatted form line-by-line (to a file or to stdout
as determined by command-line arguments, processed/opened by argparse
).
The second, after about a minute of parsing static files, reads line-by-line (from a file or stdin
, uses argparse
, etc.), processes that, and writes its own output (file or stdout
, determined by argparse
).
Both work fine when unpiped ($ ./1.py argsfile midfile; ./2.py argsfile midfile outfile
).
On smaller queries from the first script, piping the two together also works fine, and is actually a good bit faster ($ ./1.py - | ./2.py - outfile
).
However, when the database query of the first script is large, doing the two separately still works but piping them does not - my best guess is that the second script is finishing its preprocessing, checking stdin
, seeing it empty because the first script hasn't written to it yet, and proceeding on with nothing. Parse nothing, return nothing, write nothing to file.
The previous question/answer I've found on this seems to indicate that this shouldn't be possible, that for line in infile
where the infile
is standard input should block until closed. I've also tried
while True:
line = infile.readline()
if line == '':
break
else:
pass # actual processing
but that doesn't work either. I can't drop the conditional and the break entirely because then it would block forever, and this is not meant for reading an endless stream, just some piped input that's taking a while to start arriving.
Keeping it in separate scripts is a requirement due to business constraints (minimize the number of difficulties in changing away from the systems involved in either of the two steps, if the time ever comes).