-1

I've got a specific problem: I am downloading some large sets of data using requests. Each request provides me with a compressed file, containing a manifest of the download, and folders, each containing 1 file.

I can unzip the archive + remove archive, and afterwards extract all files from subdirectories + remove subdirectories.

Is there a way to combine this? Since I'm new to both actions, I studied some tutorials and stack overflow questions on both topics. I'm glad it is working, but I'd like to refine my code and possibly combine these two steps - I didn't encounter it while I was browsing other information.

So for each set of parameters, I perform a request which ends up with:

# Write the file
with open((file_location+file_name), "wb") as output_file:
    output_file.write(response.content)
# Unzip it
with tarfile.open((file_location+file_name), "r:gz") as tarObj:
    tarObj.extractall(path=file_location)
# Remove compressed file
os.remove(file_location+file_name)

And then for the next step I wrote a function that:

target_dir = keyvalue[1] # target directory is stored in this tuple
subdirs = get_imm_subdirs(target_dir) # function to get subdirectories
for f in subdirs:
    c = os.listdir(os.path.join(target_dir, f)) # find file in subdir
    shutil.move(c, str(target_dir)+"ALL_FILES/") # move them into 1 subdir
os.rmdir([os.path.join(target_dir, x) for x in subdirs]) # remove other subdirs

Is there an action I can perform during the unzip step?

MHeydt
  • 135
  • 8

1 Answers1

0

You can extract the files individually rather than using extractall.

with tarfile.open('musthaves.tar.gz') as tarObj:
    for member in tarObj.getmembers():
        if member.isfile():
            member.name = os.path.basename(member.name)
            tarObj.extract(member, ".")

With appropriate credit to this SO question and the tarfile docs.

getmembers() will provide a list what is inside the archive (as objects); you could use listnames() but then you'd have to devise you own test as to whether or not each entry is a file or directory.

isfile() - if it's not a file, you don't want it.

member.name = os.path.basename(member.name) resets the subdirectory depth - the extractor things everything is at the top level.

Alan
  • 2,914
  • 2
  • 14
  • 26
  • I read the docs and understood just enough to know how to open and extract stuff. I'm not that used to reading these things yet to find other useful information. But this should help a lot, thank you! – MHeydt Mar 13 '18 at 09:48