1

Can pickle/dill/cpickle be used to pickle an imported module to improve import speed? The Shapely module for example takes 5 seconds on my system to find and load all of the required dependencies, which I'd really like to avoid.

Can I pickle my imports once, then reuse that pickle instead of having to do slow imports every time?

Mike McKerns
  • 33,715
  • 8
  • 119
  • 139
Brian
  • 1,988
  • 1
  • 14
  • 29
  • 1
    What about pickling makes you think it would be faster than the standard way in which modules are loaded? – dimo414 Jan 22 '16 at 05:18
  • If it's all in one file then it doesn't have to search through a large sys.path looking for the module. – Brian Jan 22 '16 at 05:27
  • 1
    It's unlikely that searching `sys,path` is the source of significant slowness. – BrenBarn Jan 22 '16 at 05:28

3 Answers3

1

No. First and formost you can't pickle modules, you'll get an error:

>>> import pickle, re
>>> pickle.dump(re, open('/tmp/re.p', 'wb'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
_pickle.PicklingError: Can't pickle <class 'module'>: attribute lookup module on builtins failed

More conceptually even if you could serialize a module, you'd only be increasing the amount of work Python has to do.

Normally, when you say import module, Python has to:

  1. Find the location of the module (usually a file on the file system)
  2. Parse the source code into byte code in memory (and if possible store that parsed byte code as a .pyc file), or load a .pyc into memory directly if one exists
  3. Execute any code that is supposed to run when the module first loads

If you were to pickle a module in some way, you would essentially be replacing step 2 with your own half-baked solution.

  1. Find the location of the pickle (usually a file on the file system)
  2. Unpickle it back into a Python module
  3. Execute any code that is supposed to run when the module first loads

We can safely assume that unpickling would be slower than Python's built-in bytecode format, because if it weren't Python would use pickling under the covers anyways.

More to the point, parsing a Python file is not (very) expensive, and will hardly take any time at all. Any real slowdown would occur in step 3, and we haven't changed that. You might be asking if there's some way to skip step three with pickling, but in the general case no, that is not possible, because there's no way to guarantee that a module doesn't make changes to the rest of the environment.

Now you might know something special about the Shapely module in particular that lets you say "all the work Shapely does when imported could safely be cached between runs". In that case the right course of action is to contribute such caching behavior to the library and cache the data Shapely is loading, not the code Python is importing.

Community
  • 1
  • 1
dimo414
  • 47,227
  • 18
  • 148
  • 244
1

While dill can serialize a module, you can see from how it serializes a module that it does not save work over import. When dill serializes a module, all it does call a function that then it imports the module. So, as @dimo414 states, the answer is no.

>>> import dill
>>> import re
>>> _re = dill.dumps(re)
>>> re_ = dill.loads(_re)
>>> re_
<module 're' from '/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.pyc'>
>>> _re
'\x80\x02cdill.dill\n_import_module\nq\x00U\x02req\x01\x85q\x02Rq\x03.'
>>> 
Mike McKerns
  • 33,715
  • 8
  • 119
  • 139
1

The import latency is most likely due to loading the dependent shared objects of the GEOS-library.

Optimising this could maybe done, but it would be very hard. One way would be to build a statically compiled custom python interpreter with all DLLs and extension modules built in. But maintaining that would be a major PITA (trust me - I do it for work).

Another option is to turn your application into a service, thus only incurring the runtime-cost of starting the interpreter up once.

It depends on your actual problem if this is suitable.

deets
  • 6,285
  • 29
  • 28