25

I'm getting seriously frustrated at how slow python startup is. Just importing more or less basic modules takes a second, since python runs down the sys.path looking for matching files (and generating 4 stat() calls - ["foo", "foo.py", "foo.pyc", "foo.so"] - for each check). For a complicated project environment, with tons of different directories, this can take around 5 seconds -- all to run a script that might fail instantly.

Do folks have suggestions for how to speed up this process? For instance, one hack I've seen is to set the LD_PRELOAD_32 environment variable to a library that caches the result of ENOENT calls (e.g. failed stat() calls) between runs. Of course, this has all sorts of problems (potentially confusing non-python programs, negative caching, etc.).

Mmmh mmh
  • 5,334
  • 3
  • 21
  • 29
YGA
  • 9,546
  • 15
  • 47
  • 50
  • I don't know whether `import some_module` is slower than `from some_module import a_function, some_class, some_other_class` etc. but you could try it and see. – Tyler Jan 06 '10 at 01:01
  • Here's something to consider: Rather than the load time being attributed to actually reading the Python packages from disk, have you considered that some modules may load data files or run a certain amount of computation on import? That is, any .py file can have arbitrary Python code that runs, rather than simply declarations, which may be the source of the slowness. – BrainCore Jan 06 '10 at 03:11
  • @MatrixFrog: It's no faster, python loads and evaluates the whole module, then imports the specified objects into the local namespace. I was hacking at some similar functionality a while back. – richo Jan 06 '10 at 03:28

5 Answers5

11

zipping up as many pyc files as feasible (with proper directory structure for packages), and putting that zipfile as the very first entry in sys.path (on the best available local disk, ideally) can speed up startup times a lot.

Alex Martelli
  • 854,459
  • 170
  • 1,222
  • 1,395
  • Hmmm... empirically this didn't work. I tried zipping up all the .py, .so, and .pyc files in our site-packages directory (without compression) and putting the archive (in the /tmp directory) as the first element in the sys.path; it actually took 1.25 - 2.00 x longer to import numpy and django. – YGA Jan 06 '10 at 02:39
  • The `.so` files don't help and the `.py` files can actually _damage_ your performance (if Python decides it must recompile them, it can't save the compiled form) -- try using just `.pyc` ones, as I suggested. – Alex Martelli Jan 06 '10 at 03:26
  • Aha, that indeed sped it up - but only to being as fast it was before :-( – YGA Jan 06 '10 at 23:53
5

The first things that come to mind are:

  • Try a smaller path
  • Make sure your modules are pyc's so they'll load faster
  • Make sure you don't double import, or import too much

Other than that, are you sure that the disk operations are what's bogging you down? Is your disk/operating system really busy or old and slow?

Maybe a defrag is in order?

Seth
  • 45,033
  • 10
  • 85
  • 120
  • 4
    Double import has limited cost. sys.modules is a cache of already loaded modules. –  Jan 06 '10 at 01:31
  • @thouis - Limited, but sometimes significant. http://wiki.python.org/moin/PythonSpeed/PerformanceTips#ImportStatementOverhead – Seth Jan 07 '10 at 17:11
  • yeah i thought importing again is singleton-style – Jason Oct 12 '17 at 14:45
4

When trying to speed things up, profiling is key. Otherwise, how will you know which parts of your code are really the slow ones?

A while ago, I've created the runtime and import profile visualizer tuna, and I think it may be useful here. Simply create an import profile (with Python 3.7+) and run tuna on it:

python3.7 -X importtime -c "import scipy" 2> scipy.log
tuna scipy.log

enter image description here

Nico Schlömer
  • 53,797
  • 27
  • 201
  • 249
2

If you run out of options, you can create a ramdisk to store your python packages. A ramdisk appears as a directory in your file system, but will actually be mapped directly to your computer's RAM. Here are some instructions for Linux/Redhat.

Beware: A ramdisk is volatile, so you'll also need to keep a backup of your files on your regular hard drive, otherwise you'll lose your data when your computer shuts down.

BrainCore
  • 5,214
  • 4
  • 33
  • 38
  • In my experience, this doesn't help since the linux kernel caches reads from disk aggressively. Virtual memory has gotten so good in linux that (usually) the only difference between a buffer in ram and one in a file is that the file has a filename and persists. – Andrew Wagner Jun 16 '16 at 09:24
0

Something's missing from your premise--I've never seen some "more-or-less" basic modules take over a second to import, and I'm not running Python on what I would call cutting-edge hardware. Either you're running on some seriously old hardware, or you're running on an overloaded machine, or either your OS or Python installation is broken in some way. Or you're not really importing "basic" modules.

If it's any of the first three issues, you need to look at the root problem for a solution. If it's the last, we really need to know what the specific packages are to be of any help.

Chris B.
  • 85,731
  • 25
  • 98
  • 139