How can I check on runtime that a python module is valid without importing it?

Question

I have a package containing subpackages only one of which I need imported during runtime - but I need to test they are valid. Here is my folder structure:

game/
 __init__.py
 game1/
   __init__.py
   constants.py
   ...
 game2/
   __init__.py
   constants.py
   ...

For now the code that runs on boot does:

import pkgutil
import game as _game
# Detect the known games
for importer,modname,ispkg in pkgutil.iter_modules(_game.__path__):
    if not ispkg: continue # game support modules are packages
    # Equivalent of "from game import <modname>"
    try:
        module = __import__('game',globals(),locals(),[modname],-1)
    except ImportError:
        deprint(u'Error in game support module:', modname, traceback=True)
        continue
    submod = getattr(module,modname)
    if not hasattr(submod,'fsName') or not hasattr(submod,'exe'): continue
    _allGames[submod.fsName.lower()] = submod

but this has the disadvantage that all the subpackages are imported, importing the other modules in the subpackage (such as the constants.py etc) which amounts to some few magabytes of garbage. So I want to substitute this code with a test that the submodules are valid (they would import fine). I guess I should be using eval somehow - but how ? Or what should I do ?

EDIT: tldr;

I am looking for an equivalent to the core of the loop above:

    try:
        probaly_eval(game, modname) # fails iff `from game import modname` fails
        # but does _not_ import the module
    except: # I'd rather have a more specific error here but methinks not possible
        deprint(u'Error in game support module:', modname, traceback=True)
        continue

So I want a clear answer if an exact equivalent to the import statement vis a vis error checking exists - without importing the module. That's my question, a lot of answerers and commenters answered different questions.

I need to do this from within the running program, as stated — Mr_and_Mrs_D, Jan 27 '17 at 15:43
Yes, but you should be able to use it from inside the program, loading as a module.... https://docs.python.org/2/library/py_compile.html — fedepad, Jan 27 '17 at 15:45
@fedepad: is this _the one right way_ ? (already I had two answers both deleted...) — Mr_and_Mrs_D, Jan 27 '17 at 15:48
You cannot check a module's validity without importing it. You can import it into a different Python interpreter (see above), or into your own (see importlib). In the latter case, all side effects that the module may inflict at import time, e.g. monkey-patching something, doing arbitrary I/O, etc, _will happen to your interpreter_, even if the module itself won't be included in your namespace. — 9000, Jan 27 '17 at 15:49
@9000 - what about this: http://stackoverflow.com/a/41897538/281545 — Mr_and_Mrs_D, Jan 27 '17 at 16:06
@Mr_and_Mrs_D: Compiling is a good step, but it verifies a file to a lesser extent, due to lack of many static (compile-time) checks in Python. You can successfully compile a file that will bomb with an `AttributeError` or an `ArithmeticError` or a `KeyError`, etc, at import time. OTOH mere importing does not guarantee that imported functions will not crash at runtime anyway. — 9000, Jan 27 '17 at 16:12
@Mr_and_Mrs_D: `eval` providing a throwaway dict preserves your interpreter's namespace, a good idea! OTOH it does not stop the module being eval-ed from doing arbitrary I/O, at least, unless you are very defensive (which is hard). It depends on how much saddboxing you need, e.g. importing code downloaded from the internet vs doing a sanity check for code you mostly trust. — 9000, Jan 27 '17 at 16:25
@MoinuddinQuadri: why did you edited out the `eval` tag ? It's clearly relevant - see the comments above. Actually the compile path as mentioned is probably _an inferior way of checking for validity_ as it will miss the errors mentioned — Mr_and_Mrs_D, Jan 27 '17 at 16:40
@9000 - `OTOH mere importing does not guarantee that imported functions will not crash at runtime anyway` - lol of course - I am not looking for a magic method that will verify my program is bug free - just _for the equivalent to the above code_ - the equivalent code should pass ___iff___ above passes. `vs doing a sanity check for code you mostly trust` - see above - I want the equivalent code to the above `try: import except: print 'error'; continue` — Mr_and_Mrs_D, Jan 27 '17 at 16:45
@9000 - custom importer time - still perilous - just a poc: http://stackoverflow.com/a/43700205/281545 — Mr_and_Mrs_D, Apr 29 '17 at 20:37

Moinuddin Quadri · Answer 1 · 2017-01-27T16:10:11.200

1

If you want to compile the file without importing it (in current interpreter), you may use py_compile.compile as:

>>> import py_compile

# valid python file
>>> py_compile.compile('/path/to/valid/python/file.py')

# invalid python file
>>> py_compile.compile('/path/to/in-valid/python/file.txt')
Sorry: TypeError: compile() expected string without null bytes

Above code writes the error to std.error. In case you want to raise the exception, you will have to set doraise as True (default False). Hence, your code will be:

from py_compile import compile, PyCompileError

try:
    compile('/path/to/valid/python/file.py', doraise=True)
    valid_file = True
except PyCompileError:
    valid_file = False

As per the py_compile.compile's documents:

Compile a source file to byte-code and write out the byte-code cache file. The source code is loaded from the file named file. The byte-code is written to cfile, which defaults to file + 'c' ('o' if optimization is enabled in the current interpreter). If dfile is specified, it is used as the name of the source file in error messages instead of file. If doraise is true, a PyCompileError is raised when an error is encountered while compiling file. If doraise is false (the default), an error string is written to sys.stderr, but no exception is raised.

Check to make sure the compiled module is not imported (in current interpreter):

>>> import py_compile, sys
>>> py_compile.compile('/path/to/main.py')

>>> print [key for key in locals().keys() if isinstance(locals()[key], type(sys)) and not key.startswith('__')]
['py_compile', 'sys']  # main not present

edited Jan 27 '17 at 16:10

answered Jan 27 '17 at 15:31

Moinuddin Quadri

46,825
13
96
126

@Mr_and_Mrs_D Is this what you need? – Moinuddin Quadri Jan 27 '17 at 15:49
Are you sure this does not add the module to `sys.modules` ? – Mr_and_Mrs_D Jan 27 '17 at 15:52
1

@Mr_and_Mrs_D Check edit. I made a small test script to verify that and as you can see it is not imported when compiled – Moinuddin Quadri Jan 27 '17 at 15:59
Will test in my setup when I get back – Mr_and_Mrs_D Jan 27 '17 at 16:02
This requires you know the difference between 'foo.py' and 'foo/__init__', are 100% sure that file will execute in the python path, all import requirements are met, and the module executes. eg, importing won't have a bare "raise ValueError()" . – Jonathan Vanasco Jan 28 '17 at 01:30
@JonathanVanasco - as I commented on your answer I have code in the OP that walks the packages explicitly. However compile is still not equivalent to an import - see: [You can successfully compile a file that will bomb with an AttributeError or an ArithmeticError or a KeyError, etc, _at import time_](http://stackoverflow.com/questions/41897470/how-can-i-check-on-runtime-that-a-python-module-is-valid-without-importing-it?noredirect=1#comment70978929_41897470) - emphasis mine – Mr_and_Mrs_D Jan 28 '17 at 13:08

fedepad · Answer 2 · 2017-01-27T16:05:41.057

Maybe you're looking for the py_compile or compileall modules.
Here the documentation:
https://docs.python.org/2/library/py_compile.html
https://docs.python.org/2/library/compileall.html#module-compileall

You can load the one you want as a module and call it from within your program.
For example:

import py_compile

try:
    py_compile.compile(your_py_file, doraise=True)
    module_ok = True
except py_compile.PyCompileError:
    module_ok = False

Jonathan Vanasco · Answer 3 · 2017-01-28T01:27:42.660

You can't really do what you want efficiently. In order to see if a package is "valid", you need to run it -- not just check if it exists -- because it could have errors or unmet dependencies.

Using the pycompile and compileall will only test if you can compile a python file, not import a module. There is a big difference between the two.

That approach means you know the actual file-structure of the modules -- import foo could represent /foo.py or /foo/__init__.py.
That approach doesn't guarantee the module is in your interpreter's pythonpath or is the module your interpreter would load. Things will get tricky if you have multiple versions in /site-packages/ or python is looking in one of the many possible places for a module.
Just because your file "compiles" doesn't mean it will "run". As a package it could have unmet dependences or even raise errors.

Imagine this is your python file:

 from makebelieve import nothing
 raise ValueError("ABORT")

The above will compile, but if you import them... it will raise an ImportError if you don't have makebelieve installed and will raise a ValueError if you do.

My suggestions are:

import the package then unload the modules. to unload them, just iterate over stuff in sys.modules.keys(). if you're worried about external modules that are loaded, you could override import to log what your packages load. An example of this is in a terrible profiling package i wrote: https://github.com/jvanasco/import_logger [I forgot where I got the idea to override import from. Maybe celery?] As some noted, unloading modules is entirely dependent on the interpreter -- but pretty much every option you have has many drawbacks.
Use subprocesses to spin up a new interpreter via popen. ie popen('python', '-m', 'module_name'). This would have a lot of overhead if you do this to every needed module (an overhead of each interpreter and import), but you could write a ".py" file that imports everything you need and just try to run that. In either case, you would have to analyze the output -- as importing a "valid" package could cause acceptable errors during execution. i can't recall if the subprocess inherits your environment vars or not , but I believe it does. The subprocess is an entirely new operating system process/interpreter, so the modules will be loaded into that short-lived processes' memory.clarified answer.

Unloading a module is totally dependent on the Python's compiler and you can not be sure when compiler will do that. You may use [`delete_module`](http://utilspie.readthedocs.io/en/latest/#delete-module) (from library I created) but here also you have to be sure that the *imported* module do not holds any reference and it is totally on the compiler when it will release this memory — Moinuddin Quadri, Jan 27 '17 at 16:25
Unloading packages is not an option - it's advised against evrywhere. I am looking for a non hackish way - for _the one right way_. If I wanted to go into the pain/hack of unloading packages I would already do that - not easy. And loading an interpereter seems like a lot of overhead - nor I am sure it will not leave the module in the namespace — Mr_and_Mrs_D, Jan 27 '17 at 16:27
Well, you're asking to do a hackish thing. i would advise against the "compile" options because they require you to access the modules as files -- which means you need to know the difference between "foo.py" or "foo/__init__.py" AND those files could be outside your Python Path. If you spin up a new interpreter using subprocess, modules would be loaded into that interpreter's process – not yours. — Jonathan Vanasco, Jan 28 '17 at 01:09
No I am not and even if I am my question is different. You embark on explaining pitfalls that do not apply - I state clearly that I know my folder structure and give the loop that traverses it - so why you added a whole paragraph with compile pitfalls (that is not the path I consider, I believe a kind of eval is needed) ? — Mr_and_Mrs_D, Jan 28 '17 at 12:58

score 0 · Answer 4 · answered Jan 28 '17 at 04:13

I believe imp.find_module satisfies at least some of your requirements: https://docs.python.org/2/library/imp.html#imp.find_module

A quick test shows that it does not trigger an import:

>>> import imp
>>> import sys
>>> len(sys.modules)
47
>>> imp.find_module('email')
(None, 'C:\\Python27\\lib\\email', ('', '', 5))
>>> len(sys.modules)
47
>>> import email
>>> len(sys.modules)
70

Here's an example usage in some of my code (which attempts to classify modules): https://github.com/asottile/aspy.refactor_imports/blob/2b9bf8bd2cf22ef114bcc2eb3e157b99825204e0/aspy/refactor_imports/classify.py#L38-L44

score 0 · Accepted Answer · edited May 23 '17 at 11:54

We already had a custom importer (disclaimer: I did not write that code I 'm just the current maintainer) whose load_module:

def load_module(self,fullname):
    if fullname in sys.modules:
        return sys.modules[fullname]
    else: # set to avoid reimporting recursively
        sys.modules[fullname] = imp.new_module(fullname)
    if isinstance(fullname,unicode):
        filename = fullname.replace(u'.',u'\\')
        ext = u'.py'
        initfile = u'__init__'
    else:
        filename = fullname.replace('.','\\')
        ext = '.py'
        initfile = '__init__'
    try:
        if os.path.exists(filename+ext):
            with open(filename+ext,'U') as fp:
                mod = imp.load_source(fullname,filename+ext,fp)
                sys.modules[fullname] = mod
                mod.__loader__ = self
        else:
            mod = sys.modules[fullname]
            mod.__loader__ = self
            mod.__file__ = os.path.join(os.getcwd(),filename)
            mod.__path__ = [filename]
            #init file
            initfile = os.path.join(filename,initfile+ext)
            if os.path.exists(initfile):
                with open(initfile,'U') as fp:
                    code = fp.read()
                exec compile(code, initfile, 'exec') in mod.__dict__
        return mod
    except Exception as e: # wrap in ImportError a la python2 - will keep
        # the original traceback even if import errors nest
        print 'fail', filename+ext
        raise ImportError, u'caused by ' + repr(e), sys.exc_info()[2]

So I thought I could replace the parts that access the sys.modules cache with overriddable methods that would in my override leave that cache alone:

So:

@@ -48,2 +55,2 @@ class UnicodeImporter(object):
-        if fullname in sys.modules:
-            return sys.modules[fullname]
+        if self._check_imported(fullname):
+            return self._get_imported(fullname)
@@ -51 +58 @@ class UnicodeImporter(object):
-            sys.modules[fullname] = imp.new_module(fullname)
+            self._add_to_imported(fullname, imp.new_module(fullname))
@@ -64 +71 @@ class UnicodeImporter(object):
-                    sys.modules[fullname] = mod
+                    self._add_to_imported(fullname, mod)
@@ -67 +74 @@ class UnicodeImporter(object):
-                mod = sys.modules[fullname]
+                mod = self._get_imported(fullname)

and define:

class FakeUnicodeImporter(UnicodeImporter):

    _modules_to_discard = {}

    def _check_imported(self, fullname):
        return fullname in sys.modules or fullname in self._modules_to_discard

    def _get_imported(self, fullname):
        try:
            return sys.modules[fullname]
        except KeyError:
            return self._modules_to_discard[fullname]

    def _add_to_imported(self, fullname, mod):
        self._modules_to_discard[fullname] = mod

    @classmethod
    def cleanup(cls):
        cls._modules_to_discard.clear()

Then I added the importer in the sys.meta_path and was good to go:

importer = sys.meta_path[0]
try:
    if not hasattr(sys,'frozen'):
        sys.meta_path = [fake_importer()]
    perform_the_imports() # see question
finally:
    fake_importer.cleanup()
    sys.meta_path = [importer]

Right ? Wrong!

Traceback (most recent call last):
  File "bash\bush.py", line 74, in __supportedGames
    module = __import__('game',globals(),locals(),[modname],-1)
  File "Wrye Bash Launcher.pyw", line 83, in load_module
    exec compile(code, initfile, 'exec') in mod.__dict__
  File "bash\game\game1\__init__.py", line 29, in <module>
    from .constants import *
ImportError: caused by SystemError("Parent module 'bash.game.game1' not loaded, cannot perform relative import",)

Huh ? I am currently importing that very same module. Well the answer is probably in import's docs

If the module is not found in the cache, then sys.meta_path is searched (the specification for sys.meta_path can be found in PEP 302).

That's not completely to the point but what I guess is that the statement from .constants import * looks up the sys.modules to check if the parent module is there, and I see no way of bypassing that (note that our custom loader is using the builtin import mechanism for modules, mod.__loader__ = self is set after the fact).

So I updated my FakeImporter to use the sys.modules cache and then clean that up.

class FakeUnicodeImporter(UnicodeImporter):

    _modules_to_discard = set()

    def _check_imported(self, fullname):
        return fullname in sys.modules or fullname in self._modules_to_discard

    def _add_to_imported(self, fullname, mod):
        super(FakeUnicodeImporter, self)._add_to_imported(fullname, mod)
        self._modules_to_discard.add(fullname)

    @classmethod
    def cleanup(cls):
        for m in cls._modules_to_discard: del sys.modules[m]

This however blew in a new way - or rather two ways:

a reference to the game/ package was held in bash top package instance in sys.modules:
```
bash\
  __init__.py
  the_code_in_question_is_here.py
  game\
    ...
```
because game is imported as bash.game. That reference held references to all game1, game2,..., subpackages so those were never garbage collected
a reference to another module (brec) was held as bash.brec by the same bash module instance. This reference was imported as from .. import brec in game\game1 without triggering an import, to update SomeClass. However, in yet another module, an import of the form from ...brec import SomeClass did trigger an import and another instance of the brec module ended up in the sys.modules. That instance had a non updated SomeClass and blew with an AttributeError.

Both were fixed by manually deleting those references - so gc collected all modules (for 5 mbytes of ram out of 75) and the from .. import brec did trigger an import (this from ... import foo vs from ...foo import bar warrants a question).

The moral of the story is that it is possible but:

the package and subpackages should only reference each other
all references to external modules/packages should be deleted from top level package attributes
the package reference itself should be deleted from top level package attribute

If this sounds complicated and error prone it is - at least now I have a much cleaner view of interdependencies and their perils - time to address that.

This post was sponsored by Pydev's debugger - I found the gc module very useful in grokking what was going on - tips from here. Of course there were a lot of variables that were the debugger's and that complicated stuff

I have an open question on our loader here: http://stackoverflow.com/q/41921098/281545 — Mr_and_Mrs_D, Apr 29 '17 at 20:49

How can I check on runtime that a python module is valid without importing it?

5 Answers5

Linked