6

So lets say I have a zip file with modules/classes inside. I then read this file - read binary ("rb") to store it into memory. How would I take this zip file in memory and load a module from it. Would I need to write an import hook for this? One cannot simply run exec on binary zip data from memory, can they?

I know its simple just to load a module from a plain zip file on disk as this is done automatically by python2.7. I , however; want to know if this is possible via memory.

Update: A lot of people are mentioning about importing the zip from disk. The issue is specifically that I want to import the zip from memory NOT disk. I obviously will read it from disk and into memory byte by byte. I want to take all these bytes from within memory that make up the zip file and use it as a regular import.

efel
  • 1,054
  • 3
  • 14
  • 29
  • If you can get `exec` to read the in-memory data as a text string then I don't see why it wouldn't work. For example: http://stackoverflow.com/questions/436198/what-is-an-alternative-to-execfile-in-python-3-0 – djhoese Aug 25 '16 at 02:25
  • right but in the case of in memory zip file it is not actual code it is executing, its binary data (the zip file itself). To my understanding exec only works on code and/or strings. – efel Aug 25 '16 at 02:42
  • Ah ok so the binary zip data you have in memory is still compressed? I suppose the point of your question is that you don't want to have to uncompress it? Or at least not uncompress it yourself. I guess my next step would be looking at setuptools/distutils do it for compressed eggs. – djhoese Aug 25 '16 at 02:46
  • What is the reason behind it? Can't you use a egg file instead? It is basically a zip with metada and Python can use it natively. – Klaus D. Aug 25 '16 at 02:48
  • You could try a combination of this (http://stackoverflow.com/questions/14191900/pythonimport-module-from-memory) and the `__import__` or `importlib` or whatever your version of python uses. – djhoese Aug 25 '16 at 02:49
  • I did look at that link but that is loading modules directly from a string. However; I want to load the module from a compressed zip as if it were on disk( but it is in memory). You see how they perform the exec on that string, i'm not sure it would work on a zip file. Python somehow has a loader that loads modules from zip files automagically. Was wondering if anyone has attempted to write a hook for that yet? About the .egg file. If this is the same thing as .zip, wouldnt I still have the same issue, being compressed and all? – efel Aug 25 '16 at 02:54
  • If you are reading from a .zip file on disk, it's on disk. – l'L'l Aug 25 '16 at 03:02
  • This link (http://stackoverflow.com/questions/2886850/how-to-import-a-zip-file-to-my-py) describes loading zip files as normal directories. My guess is that if you use `importlib` or other import mechanism provided by the standard library in python that it should just accept the data and magically uncompress it. You just have to find a way to make it so the importer knows its a zip file. If the file is already on disk then just import it. If you are doing some weird server/client sending python code as a zip file then you'll probably need something fancier like in my previous link. – djhoese Aug 25 '16 at 03:03
  • Put another way: Python can import from zip files on disk. If you have a zip file on disk then just import it. If you *only* have it in memory then make the importers (importlib) think that you are passing them an on-disk zip file. – djhoese Aug 25 '16 at 03:05
  • all these importlib methods assume that the file is already on disk. So if i use any of these methods with the name of my module(my zip file) it will simply use the file on disk and NOT the zip in memory. – efel Aug 25 '16 at 03:11
  • The `imp` module should help. Right now I'm trying a simple example but am having trouble decompressing the in-memory zip file. – djhoese Aug 25 '16 at 03:33

2 Answers2

10

EDIT: Fixed the ZipImporter to work for everything (I think)

Test Data:

mkdir mypkg
vim mypkg/__init__.py
vim mypkg/test_submodule.py

__init__.py Contents:

def test():
    print("Test")

test_submodule.py Contents:

def test_submodule_func():
    print("This is a function")

Create Test Zip (on mac):

zip -r mypkg.zip mypkg
rm -r mypkg # don't want to accidentally load the directory

Special zip import in inmem_zip_importer.py:

import os
import imp
import zipfile

class ZipImporter(object):
    def __init__(self, zip_file):
        self.z = zip_file
        self.zfile = zipfile.ZipFile(self.z)
        self._paths = [x.filename for x in self.zfile.filelist]

    def _mod_to_paths(self, fullname):
        # get the python module name
        py_filename = fullname.replace(".", os.sep) + ".py"
        # get the filename if it is a package/subpackage
        py_package = fullname.replace(".", os.sep, fullname.count(".") - 1) + "/__init__.py"
        if py_filename in self._paths:
            return py_filename
        elif py_package in self._paths:
            return py_package
        else:
            return None

    def find_module(self, fullname, path):
        if self._mod_to_paths(fullname) is not None:
            return self
        return None

    def load_module(self, fullname):
        filename = self._mod_to_paths(fullname)
        if not filename in self._paths:
            raise ImportError(fullname)
        new_module = imp.new_module(fullname)
        exec self.zfile.open(filename, 'r').read() in new_module.__dict__
        new_module.__file__ = filename
        new_module.__loader__ = self
        if filename.endswith("__init__.py"):
            new_module.__path__ = [] 
            new_module.__package__ = fullname
        else:
            new_module.__package__ = fullname.rpartition('.')[0]
        return new_module

Use:

In [1]: from inmem_zip_importer import ZipImporter
In [2]: sys.meta_path.append(ZipImporter(open("mypkg.zip", "rb")))
In [3]: from mypkg import test
In [4]: test()
Test function
In [5]: from mypkg.test_submodule import test_submodule_func
In [6]: test_submodule_func()
This is a function

(from efel) one more thing... :

To read directly from memory one would need to do this :

f = open("mypkg.zip", "rb")

# read binary data we are now in memory
data = f.read()

f.close()  #important! close the file! we are now in memory

# at this point we can essentially delete the actual on disk zip file

# convert in memory bytes to file like object
zipbytes = io.BytesIO(data)

zipfile.ZipFile(zipbytes)
efel
  • 1,054
  • 3
  • 14
  • 29
djhoese
  • 3,567
  • 1
  • 27
  • 45
  • This was fun. I have never done anything like this before but google helped. Thanks to @junfengshou for showing me that the zipfile module existed. – djhoese Aug 25 '16 at 04:25
  • Also note that there is a `zipimport` package, but I don't think it works on open file objects (https://docs.python.org/2/library/zipimport.html). – djhoese Aug 25 '16 at 06:27
  • Ah very good answer. I love how you made the module import hook. Great work! I would like to add a comment. Here you are not reading the zip file directly from memory but still reading it from disk. To read directly from in memory you would have to read entire contents of binary into a variable and store in bytes IO obj like in here: http://stackoverflow.com/questions/2463770/python-in-memory-zip-library Also make sure to close the file after opening and rename the file to test if its really coming from memory. Marking this answer as solution! Once again great work! – efel Aug 25 '16 at 11:03
  • I have a couple of issues with this one I was hoping that you could clarify. After translating the `exec` to Python 3, it still didn't work for me until I added `sys.modules[fullname] = new_module` at the end. Is that an oversight? I'm also not sure if for python3 it shouldn't use `types.ModuleType` instead of `imp.new_module`. [This article](http://xion.org.pl/2012/05/06/hacking-python-imports/) would seem to confirm that at least partially. – Bartek Banachewicz Dec 05 '17 at 08:48
  • @BartekBanachewicz I would guess everything you are saying is correct. Without spending too much time researching I'm not surprised by the `sys.modules` line given https://stackoverflow.com/a/15087355/433202 and since `imp` has been deprecated since Python 3.4 any alternative is probably the smarter long term solution. – djhoese Dec 05 '17 at 14:17
  • @daveydave400 I have a few other possible improvements. Do you mind if I edit them into the answer? – Bartek Banachewicz Dec 05 '17 at 14:59
  • Go for it, but try to keep the original python 2 functionality there for reference. A new zip importer chunk of code at the bottom would be nice. – djhoese Dec 05 '17 at 15:02
-2

read a test.txt from zip(not unzip or write to disk):

python 3(if you use py2.x, you should change the py3 zip API to py2)

import zipfile

file_rst = []
with zipfile.ZipFile('/test/xxx.zip') as my_zip:
    with my_zip.open('test.txt') as my_file:
        for line in my_file:
            file_rst.append(line.strip())