Does anybody has any code for converting tar.gz file into zip using only Python code? I have been facing many issues with tar.gz as mentioned in the How can I read tar.gz file using pandas read_csv with gzip compression option?
-
It must be a problem in the csv not the tar.gz – Marlon Abeykoon Sep 01 '16 at 07:52
-
I wish that was true, but I tried with three different tar.gz files. – Geet Sep 01 '16 at 08:07
-
@Geet - Do you have feedback on answers? – sancho.s ReinstateMonicaCellio Apr 13 '18 at 03:37
2 Answers
You would have to use the tarfile module, with mode 'r|gz'
for reading.
Then use zipfile for writing.
import tarfile, zipfile
tarf = tarfile.open( name='mytar.tar.gz', mode='r|gz' )
zipf = zipfile.ZipFile( file='myzip.zip', mode='a', compression=zipfile.ZIP_DEFLATED )
for m in tarf:
f = tarf.extractfile( m )
fl = f.read()
fn = m.name
zipf.writestr( fn, fl )
tarf.close()
zipf.close()
You can use is_tarfile()
to check for a valid tar file.
Perhaps you could also use shutil
, but I think it cannot work on memory.
PS: From the brief testing that I performed, you may have issues with members m
which are directories.
If so, you may have to use is_dir()
, or even first get the info on each tar file member with tarf.getmembers()
, and the open the tar.gz
file for transferring to zip
, since you cannot do it after tarf.getmembers()
(you cannot seek backwards).

- 14,708
- 20
- 93
- 185
-
Really appreciate your help, here.I need just one more step. I changed zipf = ZipFile.open to zipf = zipfile.ZipFile.open. Now, it says "NameError: name 'ZIP_DEFLATED' is not defined" – Geet Sep 01 '16 at 08:43
-
@Geet - I have now tested the code and fixed it. Please let me know if it suits you. Beware of the issue with directories that I mention! – sancho.s ReinstateMonicaCellio May 13 '17 at 14:30
-
3doesn't seem to work on Python 3? I get `'NoneType' object has no attribute 'read'` – mu7z Mar 17 '20 at 05:10
-
This is an old Q&A, I am not certain, but I guess it should work under python3 (and I possibly tested using python3). The error likely comes from `fl = f.read()`. Please check you are correctly opening a tar file, which has only files (to begin with). – sancho.s ReinstateMonicaCellio Mar 17 '20 at 07:46
-
2this doesn't work with Python 3.7. Giving me ```AttributeError: 'NoneType' object has no attribute 'read'``` error in the command ```fl = f.read()```. Any solution for Python 3.7? – Bogota May 22 '20 at 07:45
-
1@Bogota - Apparently your `f` is of `NoneType`. Try checking why that is the return value obtained from `tarf.extractfile( m )`. Yours seems the same issue as mu7z had. – sancho.s ReinstateMonicaCellio May 22 '20 at 14:43
This just fixes a couple of tiny issues from the above answer, makes sure the mtime is preserved and makes sure compression is happening on all the files. All credit to the above for the simple answer.
from datetime import datetime
import sys
from tarfile import open
from zipfile import ZipFile, ZIP_DEFLATED, ZipInfo
compresslevel = 9
compression = ZIP_DEFLATED
with open(name=sys.argv[1], mode='r|gz') as tarf:
with ZipFile(file=sys.argv[2], mode='w', compression=compression, compresslevel=compresslevel) as zipf:
for m in tarf:
mtime = datetime.fromtimestamp(m.mtime)
print(f'{mtime} - {m.name}')
zinfo: ZipInfo = ZipInfo(
filename=m.name,
date_time=(mtime.year, mtime.month, mtime.day, mtime.hour, mtime.minute, mtime.second)
)
if not m.isfile():
# for directories and other types
continue
f = tarf.extractfile(m)
fl = f.read()
zipf.writestr(zinfo, fl, compress_type=compression, compresslevel=compresslevel)
print('done.')

- 91
- 1
- 2