0

I'm looking for a way to extract a specific file (knowing his name) from an archive containing multiple ones, without writing any file on the hard drive.

I tried to use both StringIO and zipfile, but I only get the entire archive, or the same error from Zipfile (open require another argument than a StringIo object)

Needed behaviour:

archive.zip #containing ex_file1.ext, ex_file2.ext, target.ext
extracted_file #the targeted unzipped file

archive.zip = getFileFromUrl("file_url")
extracted_file = extractFromArchive(archive.zip, target.ext)

What I've tried so far:

import zipfile, requests

data = requests.get("file_url")                                 
zfile = StringIO.StringIO(zipfile.ZipFile(data.content))
needed_file = zfile.open("Needed file name", "r").read()
Chris Prolls
  • 89
  • 12

2 Answers2

1

There is a builtin library, zipfile, made for working with zip archives.

https://docs.python.org/2/library/zipfile.html

You can list the files in an archive:

ZipFile.namelist()

and extract a subset:

ZipFile.extract(member[, path[, pwd]])

EDIT: This question has in-memory zip info. TLDR, Zipfile does work with in-memory file-like objects.

Python in-memory zip library

NateTheGrate
  • 590
  • 3
  • 11
  • Does it handle stringIO type? – Chris Prolls Sep 05 '18 at 07:15
  • I'm not sure what stringIO type is...that seems to be a library for reading a string buffer. I'm not sure if zipfile supports in-memory decompression, you'd have to do some research, starting with the documentation. – NateTheGrate Sep 05 '18 at 12:07
0

I finally found why I didn't succeed to do it after few hours of testing :

I was bufferring the zipfile object instead of buffering the file itself and then open it as a Zipfile object, which raised a type error.

Here is the way to do :

import zipfile, requests

data = requests.get(url)                                 # Getting the archive from the url
zfile = zipfile.ZipFile(StringIO.StringIO(data.content)) # Opening it in an emulated file
filenames = zfile.namelist()                             # Listing all files 
for name in filesnames:
    if name == "Needed file name":                       # Verify the file is present
        needed_file = zfile.open(name, "r").read()       # Getting the needed file content
        break
Chris Prolls
  • 89
  • 12