0

I want to get the uncompressed file size of a tar.gz file which is larger than 4GB in size. I found a shell command to do the same and the shell command works perfectly fine. But when I use the same command in my python program it never completes.

I am running the script on RHEL 6.8.

Command to get the correct uncompressed file size

gzip -dc some_tar_gz.tar.gz | wc -c

My python script

import subprocess
import shlex
from pprint import pprint

command_list = shlex.split("gzip -dc some_tar_gz.tar.gz | wc -c")
result = subprocess.Popen(command_list, stdout=subprocess.PIPE,   stderr=subprocess.PIPE, shell=True)
out, err = result.communicate()
pprint(out)

The above gzip command returned the uncompressed file size in under 5 mins. But the above python script didn't return any result even after 1 hour.

Edit 1:

When I removed shell=True and saw the result of top command python process was taking around 27GB VIRT after that the process was automatically killed. I got the problem but I don't know how to resolve this.

Sam Si
  • 163
  • 13
  • It's the shell pipeline that causes this. The subprocess docs outline an [approach on how to replace that](https://docs.python.org/3/library/subprocess.html#replacing-shell-pipeline). It boils down to using two Popen instances, one for each side of the pipe, using the first subprocess's stdout on the second one's stdin. – shmee Jan 08 '19 at 09:54
  • Possible duplicate of [How to run " ps cax | grep something " in Python?](https://stackoverflow.com/questions/6780035/how-to-run-ps-cax-grep-something-in-python) – shmee Jan 08 '19 at 09:54
  • @shmee Thanks, your approach worked flawlessly. Please move your comment to answer so that I can accept it as the correct answer. – Sam Si Jan 08 '19 at 10:13
  • I'd prefer if you'd accept the duplicate flag instead. Despite the different shell commands, your question, resp. the underlying issue is virtually identical to the question I referenced. I could not do much more than copy unutbu's answer on the other question :) – shmee Jan 08 '19 at 10:23
  • @shmee I am new here but how to accept duplicate flag? – Sam Si Jan 08 '19 at 10:26
  • @shmee I could only find flag this question as a duplicate. I didn't find accepting the duplicate. – Sam Si Jan 08 '19 at 10:34
  • Can't really tell :) Never was in that position myself. There seems to be some [recipe](https://meta.stackexchange.com/questions/250981/new-ui-encourages-askers-to-confirm-or-dispute-duplicate-votes/250974#250974) but it appears to require at least 15 rep. Let's hope my flag gets accepted by close vote reviewers eventually and does not just age away as it tends to happen a lot recently ... – shmee Jan 08 '19 at 10:40

1 Answers1

1

Working code in case someone has the same question

import subprocess
import shlex
from pprint import pprint

command_list_1 = shlex.split("gzip -dc some_tar_file.tar.gz")
command_list_2 = shlex.split("wc -c")

p1 = subprocess.Popen(command_list_1, stdout=subprocess.PIPE)
p2 = subprocess.Popen(command_list_2, stdin=p1.stdout, stdout=subprocess.PIPE)
p1.stdout.close()

output = p2.communicate()[0]
pprint(output.rstrip())
Sam Si
  • 163
  • 13