0

Here is a test file:

gunzip -c file_1.gz
Line 1
Line 2
Line 3

I am executing bash commands this way:

cmd = "gunzip -c file_1.gz | grep 3"
subprocess.call(cmd, shell=True))
Line 3

I need to run this command on several files in parallel, then join the processes. SO it seems I have to use subprocess.Popen().communicate(). However Popen won't recognize the pipe correctly and will feed it to the first command, gunzip in my case:

subprocess.Popen(cmd.split()).communicate())
gunzip: can't stat: | (|.gz): No such file or directory
gunzip: can't stat: grep (grep.gz): No such file or directory
gunzip: can't stat: 8 (8.gz): No such file or directory

I would like to keep the whole command and to avoid separating it this way:

gunzip = subprocess.Popen('gunzip -c file_1.gz'.split(), stdout=subprocess.PIPE)
grep = subprocess.Popen('grep 3'.split(), stdin=gunzip.stdout, stdout=subprocess.PIPE)
gunzip.stdout.close()
output = grep.communicate()[0]
gunzip.wait()

Is there a way to not separate the commands and process the pipe correctly?

Ben Harrison
  • 2,121
  • 4
  • 24
  • 40
kaligne
  • 3,098
  • 9
  • 34
  • 60
  • 1
    What does "join the processes" mean? Do you want to capture the output of several processes running concurrently? Here's [code example](http://stackoverflow.com/a/23616229/4279). Unrelated: your code is probably IO bound i.e., there might be no point to read the files in parallel unless they are in memory already. – jfs May 21 '16 at 03:02
  • Sorry for delay.. By joining the processes I mean waiting until all the grep are finished on each file. Your answer you are referring to is noteworthy! – kaligne Aug 27 '16 at 12:43

1 Answers1

1

To run the grep 3 command you need the output from the previous command, so there is no way to run this successfully in a single command with subprocess.Popen.

If you always want to run grep 3 for all the files, you could just join the results of all the gunzip -c file_x.gz and then run the grep command only once on the entire list.

subprocess.Popen('gunzip -c file_1.gz'.split(), stdout=subprocess.PIPE)
subprocess.Popen('gunzip -c file_2.gz'.split(), stdout=subprocess.PIPE)
...
grep = subprocess.Popen('grep 3'.split(), stdin=all_gunzip_stdout, stdout=subprocess.PIPE)
Silu
  • 176
  • 1
  • 7