9

is there anyway to make a file inside a zip file seekable in Python without reading it to memory?

I tried the obvious procedure but I get an error since the file is not seekable:

In [74]: inputZipFile = zipfile.ZipFile("linear_g_LAN2A_F_3keV_1MeV_30_small.zip", 'r')

In [76]: inputCSVFile = inputZipFile.open(inputZipFile.namelist()[0], 'r')   

In [77]: inputCSVFile
Out[77]: <zipfile.ZipExtFile at 0x102f5fad0>

In [78]: inputCSVFile.se
inputCSVFile.seek      inputCSVFile.seekable  

In [78]: inputCSVFile.seek(0)
---------------------------------------------------------------------------
UnsupportedOperation                      Traceback (most recent call last)
<ipython-input-78-f1f9795b3d55> in <module>()
----> 1 inputCSVFile.seek(0)

UnsupportedOperation: seek
Dharman
  • 30,962
  • 25
  • 85
  • 135
jbssm
  • 6,861
  • 13
  • 54
  • 81

2 Answers2

9

There is no way to do so for all zip files. DEFLATE is a stream compression algorithm, which means that there is no way to decompress arbitrary parts of the file without having decompressed everything before it. It could possibly be implemented for files that have been stored, but then you get in the unfavorable position where some entries are seekable and others aren't.

Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
  • I see, thank you. But from what I'm searching, it's possible with tar files, correct? – jbssm Oct 10 '12 at 14:52
  • Only if the tar file is uncompressed. As soon as you throw in gzip (DEFLATE) compression, you get the same problem. – Ignacio Vazquez-Abrams Oct 10 '12 at 14:54
  • 1
    Although it happens on the fly, I can use a gzip compressed tar file and seek inside it, Python seems to either be decompressing it in memory or somewhere in a tmp disk and the process takes lot of time compared to an uncompressed file - about 1min vs 4 seconds to the example I'm trying. Thank you for all the help. – jbssm Oct 10 '12 at 15:29