3

The timeit module is great for measuring the execution time of small code snippets but when the code changes global state (like timeit) it's really hard to get accurate timings.

For example if I want to time it takes to import a module then the first import will take much longer than subsequent imports, because the submodules and dependencies are already imported and the files are already cached. So using a bigger number of repeats, like in:

>>> import timeit
>>> timeit.timeit('import numpy', number=1)
0.2819331711316805

>>> # Start a new Python session:
>>> timeit.timeit('import numpy', number=1000)
0.3035142574359181

doesn't really work, because the time for one execution is almost the same as for 1000 rounds. I could execute the command to "reload" the package:

>>> timeit.timeit('imp.reload(numpy)', 'import importlib as imp; import numpy', number=1000)
3.6543283935557156

But that it's only 10 times slower than the first import seems to suggest it's not accurate either.

It also seems impossible to unload a module entirely ("Unload a module in Python").

So the question is: What would be an appropriate way to accuratly measure the import time?

Community
  • 1
  • 1
MSeifert
  • 145,886
  • 38
  • 333
  • 352
  • 1
    Good question. just an idea, maybe it's bad: run 100 times the interpreter on an empty script, then 100 times the interpreter with `import numpy` and substract. – Jean-François Fabre May 15 '17 at 21:44
  • 3
    I think that it's worth asking why you want to do this... Since this is a price that you are only going to pay once per program execution, it's probably also sufficient to time the importing only one or two times to get a sense for how slow it feels... hard numbers probably don't mean as much here because you're never going to hit the slow path enough for your statistics to actually mean something anyway ... – mgilson May 15 '17 at 21:45
  • @Jean-FrançoisFabre You mean like creating a script that calls `python -m timeit 'import numpy' -r 1` from the command line? – MSeifert May 15 '17 at 21:54
  • @MSeifert yes. And another which calls `python -m timeit 'pass'` – Jean-François Fabre May 15 '17 at 21:57
  • @mgilson I try to improve the import time of one of my packages. I already did some profiling and `number=1` timings but it's hard to see even 10% improvement (or regression) for 5-10 manual import timings. The reason **why** I try to reduce the import time is because I wanted to create some CL scripts but these have to pay the "import" time every time the script is called (and the import time seems to be the major bottleneck there). – MSeifert May 15 '17 at 21:59
  • @Jean-FrançoisFabre Do you want to wrap that up as answer? It definetly sounds like a great idea. Haven't thought about the CLI until you mentioned scripts. :) – MSeifert May 15 '17 at 22:09

2 Answers2

3

Since it's nearly impossible to fully unload a module, maybe the inspiration behind this answer is this...

You could run a loop in a python script to run x times a python command importing numpy and another one doing nothing, and substract both + average:

import subprocess,time

n=100
python_load_time = 0
numpy_load_time = 0

for i in range(n):
    s = time.time()
    subprocess.call(["python","-c","import numpy"])
    numpy_load_time += time.time()-s

    s = time.time()
    subprocess.call(["python","-c","pass"])
    python_load_time += time.time()-s

print("average numpy load time = {}".format((numpy_load_time-python_load_time)/n))
Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
1

I think nowadays (2023) the preferred method is python -X importtime -c 'import numpy' (or, if you want to save the results to file, python -X importtime -c 'import numpy' 2> import-timing-results.txt). Available since Python 3.7. Link to docs.

drammock
  • 2,373
  • 29
  • 40