There are a few libraries used to extract archive files through Python, such as gzip, zipfile library, rarfile, tarfile, patool etc. I found one of the libraries (patool) to be especially useful due to its cross-format feature in the sense that it can extract almost any type of archive including the most popular ones such as ZIP, GZIP, TAR and RAR.
To extract an archive file with patool it is as easy as this:
patoolib.extract_archive( "Archive.zip",outdir="Folder1")
Where the "Archive.zip"
is the path of the archive file and the "Folder1"
is the path of the directory where the extracted file will be stored.
The extracting works fine. The problem is that if I run the same code again for the exact same archive file, an identical extracted file will be stored in the same folder but with a slightly different name (filename at the first run, filename1 at the second, filename11 at the third and so on.
Instead of this, I need the code to overwrite the extracted file if a file under a same name already exists in the directory.
This extract_archive
function looks so minimal - it only have these two parameters, a verbosity
parameter, and a program
parameter which specifies the program you want to extract archives with.
Edits:
Nizam Mohamed's answer documented that extract_archive
function is actually overwriting the output. I found out that was partially true - the function overwrites ZIP files, but not GZ files which is what I am after. For GZ files, the function still generates new files.
Edits Padraic Cunningham's answer suggested using the master source . So, I downloaded that code and replaced my old patool library scripts with the scripts in the link. Here is the result:
os.listdir()
Out[11]: ['a.gz']
patoolib.extract_archive("a.gz",verbosity=1,outdir=".")
patool: Extracting a.gz ...
patool: ... a.gz extracted to `.'.
Out[12]: '.'
patoolib.extract_archive("a.gz",verbosity=1,outdir=".")
patool: Extracting a.gz ...
patool: ... a.gz extracted to `.'.
Out[13]: '.'
patoolib.extract_archive("a.gz",verbosity=1,outdir=".")
patool: Extracting a.gz ...
patool: ... a.gz extracted to `.'.
Out[14]: '.'
os.listdir()
Out[15]: ['a', 'a.gz', 'a1', 'a2']
So, again, the extract_archive
function is creating new files everytime it is executed. The file archived under a.gz
has a different name from a
actually.