-1

I'm using numba to develop my package, and actually can get times of speed-ups in calculations. Now I'm facing the problem that, when I run my package in command line using

python my_package_name.py

Many times are spent during importing. To demonstrate this, I used -X importtime for testing:

python -X importtime test_my_package.py

The results are:

import time: self [us] | cumulative | imported package
...
import time:      2422 |       3620 |               numpy.core._multiarray_umath
import time:       425 |        425 |                   numpy.compat._inspect
import time:        62 |         62 |                       errno
import time:       439 |        439 |                         urllib
import time:      1199 |       1638 |                       urllib.parse
import time:      1036 |       2735 |                     pathlib
...
import time:      1408 |       2748 |                     pickle
...
import time:     15060 |      24697 |           numpy.core._add_newdocs_scalars
...
import time:      4627 |     111112 |       numpy
...
import time:      1043 |      31561 |           numba.core.config
...
import time:      1111 |       5296 |                 numba.core.errors
import time:       434 |       5729 |               numba.core.types.common
import time:       566 |        566 |                   numba.core.typeconv.castgraph
import time:       379 |        944 |                 numba.core.typeconv
import time:       331 |        331 |                   numba.core.consts
import time:      1104 |       1435 |                 numba.core.ir
import time:      1083 |       3461 |               numba.core.types.misc
import time:      1293 |      10483 |             numba.core.types.containers
import time:      2129 |       2129 |               logging
import time:      1775 |       3904 |             numba.core.types.functions
...
import time:      9340 |       9340 |             scipy._distributor_init
import time:      1915 |       1915 |             scipy._lib._pep440
import time:       562 |        562 |               scipy._lib._ccallback_c
import time:       952 |       1513 |             scipy._lib._ccallback
import time:      2882 |      17989 |           scipy
import time:      3957 |     211268 |         numba
...
import time:  13506351 |   13834005 |       my_package.utilities
...
import time:   3029710 |    3029710 |       my_package.extract_features
import time:   4805845 |    4805845 |       my_package.fast_annotate_spectrum
...
import time:    345628 |     345628 |         my_package.modification_correction
...
import time:   3769461 |    3769461 |             my_package.xxx.fast_annotate_spectrum
...
import time:   4831766 |    4847170 |               my_package.xxx.utilities
...
import time:   9825949 |    9825949 |             my_package.machinelearning.utilities
...
import time:     55041 |     102863 |                                   scipy.stats._continuous_distns
...
import time:   1178353 |   11814805 |           my_package.machinelearning.randomforest
...
import time:   2767857 |    2767857 |       my_package.retrieval_utilities

The list is really long (>1300 lines), so I removed those with small importing times, but kept some times when importing numba, numpy and scipy with relatively large times as benchmarks. Obviously, the importing times for the modules in my_package are significantly large, even more than 10 sec (i.e., my_package.utilities). These modules contain all functions I implemented for calculations which are accelerated using numba.njit, i.e., with decolarator @numba.njit.

As importing all other modules are quite normal comparing to importing the Python's builtin modules, I suspect that the large importing time is due to importing those functions for numba compilation (by @numba.njit). Actually, when I commented out some @numba.njit decolators in module my_package.utilities to make them normal python functions, the importing time is reduced dramatically:

import time:   1921836 |    2647152 |       my_package.utilities

Any way I can improve this?

Elkan
  • 546
  • 8
  • 23

1 Answers1

1

You can cache the njit functions:

@numba.njit(cache=True)

Which will reload the compiled functions the next time you run your program

however it can get a bit tricky to delete the cached functions if you need to make a change

Tom McLean
  • 5,583
  • 1
  • 11
  • 36
  • Great! Thanks. Can you elaborate more about `however it can get a bit tricky to delete the cached functions if you need to make a change`? – Elkan Sep 01 '22 at 11:20
  • @Elkan The cached files are hidden away in your computer, probably in the generated `__pycache__` folder https://stackoverflow.com/questions/44131691/how-to-clear-cache-or-force-recompilation-in-numba – Tom McLean Sep 01 '22 at 11:27
  • I read a bit the doc in `numba` about the cache, if I change the souce code, will it be updated in the saved cache? Or I have to remove previous cache file and re-compile? – Elkan Sep 02 '22 at 02:02
  • @Elksn I believe you need to delete the cache. – Tom McLean Sep 02 '22 at 05:29
  • If you change the source code, you do not actually need to delete the cache. – Larry Panozzo Dec 17 '22 at 16:08