1

I am trying to calculate the sum of size of various files. This is my script:

import os
date = raw_input('Enter date in format YYYYMMDD ')
file1 = 'p_poupe_' + date + '.tar.gz.done'
file2 = 'p_poupw_' + date + '.tar.gz.done'
file3 = 'p_pojk_' + date + '.tar.gz.done'

a1 = os.system('zcat ' + file1 + '|wc --bytes')
a2 = os.system('zcat ' + file2 + '|wc --bytes')
a3 = os.system('zcat ' + file3 + '|wc --bytes')

print a1,a2,a3
sum = a1 + a2 + a3

print sum

But the values are not storing in variable. Can any one tell me what I am doing wrong. How can I modify script so that values are stored in variable and not as a output.

user2922822
  • 123
  • 1
  • 1
  • 8
  • `os.system` will be returning the return code of `wc` not the output to `stdout` – Peter Wood Mar 13 '15 at 12:37
  • possible duplicate of [What is the return value of os.system() in Python?](http://stackoverflow.com/questions/6466711/what-is-the-return-value-of-os-system-in-python) – Peter Wood Mar 13 '15 at 12:38
  • 1
    [os.path.getsize](https://docs.python.org/2/library/os.path.html#os.path.getsize) should get the work done... `os.system` return value is not stdout of created process. – Łukasz Rogalski Mar 13 '15 at 12:42
  • also note that `sum()` is a built in function, naming your variables to anything defined in the language itself is considered bad. you could use `print(sum(a1+a2+a3))` assuming a1, a2 and a3 had the correct values you were looking for. – Torxed Mar 13 '15 at 13:05

5 Answers5

1

On Unix, the return value is the exit status of the process encoded in the format specified for wait(). Note that POSIX does not specify the meaning of the return value of the C system() function, so the return value of the Python function is system-dependent.

On Windows, the return value is that returned by the system shell after running command, given by the Windows environment variable COMSPEC: on command.com systems (Windows 95, 98 and ME) this is always 0; on cmd.exe systems (Windows NT, 2000 and XP) this is the exit status of the command run; on systems using a non-native shell, consult your shell documentation.

https://docs.python.org/2/library/os.html#os.system

The problem is that you're using exit-codes rather than stdout data as your "values". You're probably looking to use subprocess.Popen for instance. Or just simply code the solution manually by opening the files.

Try using https://docs.python.org/3/library/gzip.html

import gzip
def get_fcont_len(fname):
    with gzip.open(fname) as f:
        return len(f.read())
total = 0
date = raw_input('Enter date in format YYYYMMDD ')
total += get_fcont_len('p_poupe_' + date + '.tar.gz.done')
total += get_fcont_len('p_poupw_' + date + '.tar.gz.done')
total += get_fcont_len('p_pojk_' + date + '.tar.gz.done')
print(total)
Torxed
  • 22,866
  • 14
  • 82
  • 131
1

os.system return the exit status of the command not the output of the command. To capture the output of a command you should look into the subprocess module.

subprocess.check_output("zcat " + file1 + " | wc --bytes", shell=True)
# Output the size in bytes of file1 with a trailing new line character

However it is probably better to use other python modules/methods to do that as suggested by other as it is preferable to do things directly in Python.

El Bert
  • 2,958
  • 1
  • 28
  • 36
1

The uncompressed file size is stored in the last 4 bytes of the gzip file. This function will return the size of the uncompressed file, i.e. the "gunzipped" size:

import os
import gzip
import struct

def get_gunzipped_size(filename):
    with gzip.open(filename) as f:
        _ = f.read(1)    # elicit IOError if file is not a gzip file
        f.fileobj.seek(-4, os.SEEK_END)
        return struct.unpack('<i', f.fileobj.read(4))[0]

On large files this is much faster than reading all of the uncompressed data and counting it's length because the whole file does not need to be decompressed.

Fitting this into your code:

import os

date = raw_input('Enter date in format YYYYMMDD ')
prefixes = ('p_poupe_', 'p_poupw_', 'p_pojk_')
files = ['{}{}.tar.gz.done'.format(prefix, date) for prefix in prefixes]

total_uncompressed = sum(get_gunzipped_size(f) for f in files)
print total_uncompressed
mhawke
  • 84,695
  • 9
  • 117
  • 138
  • "with gzip.open(filename) as f" is showing syntax error, don't know why. – user2922822 Mar 16 '15 at 07:40
  • There should be a colon at the end of the `with` statement, i.e. `with gzip.open(filename) as f:` – mhawke Mar 16 '15 at 07:59
  • @user2922822 : I have just made a minor change to the code that reads the file size so that this code will also work in Python 3 (it's mandatory to pass the number of bytes to be read). – mhawke Mar 16 '15 at 08:15
  • _Colon_ (`:`), not semi-colon (`;`). Other than that, I don't know what else could be wrong - it works for me in Python 2 and Python 3. What version of Python are you using? The `with` statement has been available as a "future" import since 2.5 (requires `from __future__ import with_statement` before use), and is enabled by default in 2.6. – mhawke Mar 16 '15 at 09:59
0

You can use the os module to get the file size. Try this:

import os
import tarfile

tar = tarfile.open("yourFile.tar.gz")
tar.extractall("folderWithExtractedFiles")
print os.path.getsize("folderWithExtractedFiles/yourFileInsideTarGz")
Yuri Malheiros
  • 1,400
  • 10
  • 16
0

You can capture the output of a command using getoutput function from commands as:

import commands as cm
.
.
.
a1 = cm.getoutput('zcat ' + file1 + '|wc --bytes')
a2 = cm.getoutput('zcat ' + file2 + '|wc --bytes')
a3 = cm.getoutput('zcat ' + file3 + '|wc --bytes')

# Note that the outputs are in string format so you need to convert them to integers or floats 
a1, a2, a3 = float(a1), float(a2), float(a3) 

print a1,a2,a3
sum = a1 + a2 + a3

print sum
Irshad Bhat
  • 8,479
  • 1
  • 26
  • 36