So basically I have a file system like this:
main_archive.tar.gz
main_archive.tar
sub_archive.xml.gz
actual_file.xml
There are hundreds of files in this archive... So basically, can the gzip
package be used with multiple files in Python 3? I've only used it with a single file zipped so I'm at a loss on how to go over multiple files or multiple levels of "zipping".
My usual method of decompressing is:
with gzip.open(file_path, "rb") as f:
for ln in f.readlines():
*decode encoding here*
Of course, this has multiple problems because usually "f" is just a file... But now I'm not sure what it represents?
Any help/advice would be much appreciated!
EDIT 1:
I've accepted the answer below, but if you're looking for similar code, my backbone was basically:
tar = tarfile.open(file_path, mode="r")
for member in tar.getmembers():
f = tar.extractfile(member)
if verbose:
print("Decoding", member.name, "...")
with gzip.open(f, "rb") as temp:
decoded = temp.read().decode("UTF-8")
e = xml.etree.ElementTree.parse(decoded).getroot()
for child in e:
print(child.tag)
print(child.attrib)
print("\n\n")
tar.close()
Main packages used were gzip
, tarfile
, and xml.etree.ElementTree
.