0

I have hundreds of CSV files zipped. This is great because they take very little space but when it is time to use them, I have to make some space on my HD and unzip them before I can process. I was wondering if it is possible with python(or linux command line) to unzip a file while reading it. In other words, I would like to open a zip file, start to decompress the file and as we go, process the file.

So there would be no need for extra space on my drive. Any ideas or suggestions?

user1214120
  • 173
  • 1
  • 2
  • 15
  • I got the same problem as this guy: http://stackoverflow.com/questions/3170625/unzip-file-while-reading-it, but on linux – user1214120 Jul 19 '13 at 23:34
  • http://stackoverflow.com/questions/2018512/reading-tar-file-contents-without-untarring-it-in-python-script – seth Jul 19 '13 at 23:36
  • 3
    Take a look at the [zipfile module](http://docs.python.org/2/library/zipfile), I think it may be what you are looking for. – Andrew Clark Jul 19 '13 at 23:36
  • 1
    `zcat file | grep | awk '{....}' | etc | etc | sort | etc` ? Good luck. – shellter Jul 19 '13 at 23:53

2 Answers2

0

While it's very possible to open ZIP files in Python, it is also possible to transparently handle this operation using a filesystem extension. If this is preferable or not depends on various factors including system access and solution portability.

See Fuse-Zip:

With fuse-zip you really can work with ZIP archives as real directories. Unlike KIO or Gnome VFS, it can be used in any application without modifications.

Or AVFS: A Virtual File System:

AVFS is a system, which enables all programs to look inside gzip, tar, zip, etc. files or view remote (ftp, http, dav, etc.) files, without recompiling the programs.

Note that these solutions are system-specific and rely on FUSE. There might be similar transparent solutions for Windows - but that would require another investigation for the specific system.

user2246674
  • 7,621
  • 25
  • 28
0

Python, since the 1.6 version, provides the module zipfile to handle this kind of circumstances. An example usage:

import csv
import zipfile

with zipfile.ZipFile('myarchive.zip') as archive:
    with archive.open('the_zipped_file.csv') as fin:
        reader = csv.reader(fin, ...)
        for record in reader:
            # process record.

note that in python3 things get a bit more complicated because the file-like object returned by archive.open yields bytes, while csv.reader wants strings. You can write a simple class that does the conversion from bytes to strings using a given encoding:

class EncodingConverter:
    def __init__(self, fobj, encoding):
        self._iter_fobj = iter(fobj)
        self._encoding = encoding
    def __iter__(self):
        return self
    def __next__(self):
        return next(self._iter_fobj).decode(self._encoding)

and use it like:

import csv
import zipfile

with zipfile.ZipFile('myarchive.zip') as archive:
    with archive.open('the_zipped_file.csv') as fin:
        reader = csv.reader(EncodingConverter(fin, 'utf-8'), ...)
        for record in reader:
            # process record.
Bakuriu
  • 98,325
  • 22
  • 197
  • 231