Python subprocess call stuck when used with grep |

Question

I want to get the uncompressed file size of a tar.gz file which is larger than 4GB in size. I found a shell command to do the same and the shell command works perfectly fine. But when I use the same command in my python program it never completes.

I am running the script on RHEL 6.8.

Command to get the correct uncompressed file size

gzip -dc some_tar_gz.tar.gz | wc -c

My python script

import subprocess
import shlex
from pprint import pprint

command_list = shlex.split("gzip -dc some_tar_gz.tar.gz | wc -c")
result = subprocess.Popen(command_list, stdout=subprocess.PIPE,   stderr=subprocess.PIPE, shell=True)
out, err = result.communicate()
pprint(out)

The above gzip command returned the uncompressed file size in under 5 mins. But the above python script didn't return any result even after 1 hour.

Edit 1:

When I removed shell=True and saw the result of top command python process was taking around 27GB VIRT after that the process was automatically killed. I got the problem but I don't know how to resolve this.

It's the shell pipeline that causes this. The subprocess docs outline an [approach on how to replace that](https://docs.python.org/3/library/subprocess.html#replacing-shell-pipeline). It boils down to using two Popen instances, one for each side of the pipe, using the first subprocess's stdout on the second one's stdin. — shmee, Jan 08 '19 at 09:54
Possible duplicate of [How to run " ps cax | grep something " in Python?](https://stackoverflow.com/questions/6780035/how-to-run-ps-cax-grep-something-in-python) — shmee, Jan 08 '19 at 09:54
@shmee Thanks, your approach worked flawlessly. Please move your comment to answer so that I can accept it as the correct answer. — Sam Si, Jan 08 '19 at 10:13
I'd prefer if you'd accept the duplicate flag instead. Despite the different shell commands, your question, resp. the underlying issue is virtually identical to the question I referenced. I could not do much more than copy unutbu's answer on the other question :) — shmee, Jan 08 '19 at 10:23
@shmee I could only find flag this question as a duplicate. I didn't find accepting the duplicate. — Sam Si, Jan 08 '19 at 10:34
Can't really tell :) Never was in that position myself. There seems to be some [recipe](https://meta.stackexchange.com/questions/250981/new-ui-encourages-askers-to-confirm-or-dispute-duplicate-votes/250974#250974) but it appears to require at least 15 rep. Let's hope my flag gets accepted by close vote reviewers eventually and does not just age away as it tends to happen a lot recently ... — shmee, Jan 08 '19 at 10:40

score 1 · Accepted Answer · answered Dec 13 '19 at 09:09

Working code in case someone has the same question

import subprocess
import shlex
from pprint import pprint

command_list_1 = shlex.split("gzip -dc some_tar_file.tar.gz")
command_list_2 = shlex.split("wc -c")

p1 = subprocess.Popen(command_list_1, stdout=subprocess.PIPE)
p2 = subprocess.Popen(command_list_2, stdin=p1.stdout, stdout=subprocess.PIPE)
p1.stdout.close()

output = p2.communicate()[0]
pprint(output.rstrip())

Python subprocess call stuck when used with grep |

1 Answers1