0

I'm working with zipped files in python for the first time, and I'm stumped.

I read the documentation for zipfile, but I'm not sure what would be the best way to do what I'm trying to do. I have a zipped folder with CSV files inside, and I'd like to be able to open the zip file, and retrieve certain values from the csv files inside.

Do I use zipfile.extract(file name here) to bring it to the current working directory? And if I do that, do I just use the file name to work with the file, or does this index or list them differently?

Currently, I manually extract all files in the zipped folder to the current working directory for my project, and then use the csv module to read them. All I'm really trying to do is remove that step.

Any and all help would be greatly appreciated!

clarktwain
  • 210
  • 1
  • 3
  • 13
  • https://stackoverflow.com/questions/3451111/unzipping-files-in-python. As the answer to this question shows, the argument to `extract` method is not the file name but the directory to extract to. – Woody Pride Nov 27 '17 at 19:10
  • 1
    You should experiment a bit and see what happens. – wwii Nov 27 '17 at 19:30

1 Answers1

0

You are looking to avoid extracting to disk, in the zip docs for python there is ZipFile.open() which gives you a file-like object. That is an object that mostly behaves like a regular file on disk, but it is in memory. It gives a bytes array when read, at least in py3.

Something like this...

from zipfile import ZipFile
import csv


with ZipFile('abc.zip') as myzip:
    print(myzip.filelist)
    for mf in myzip.filelist:
        with myzip.open(mf.filename) as myfile:
            mc = myfile.read()
            c = csv.StringIO(mc.decode())
            for row in c:
                print(row)

The documentation of Python is actually quite good once one has learned how to find things as well as some of the basic programming terms/descriptions used in the documentation. For some reason csv.BytesIO is not implemented, hence the extra step via csv.StringIO.

ahed87
  • 1,240
  • 10
  • 10