7

I have a fairly large codebase written in numba, and I have noticed that when the cache is enabled for a function calling another numba compiled function in another file, changes in the called function are not picked up when the called function is changed. The situation occurs when I have two files:

testfile2:

import numba

@numba.njit(cache=True)
def function1(x):
    return x * 10

testfile:

import numba
from tests import file1

@numba.njit(cache=True)
def function2(x, y):
    return y + file1.function1(x)

If in a jupyter notebook, I run the following:

# INSIDE JUPYTER NOTEBOOK
import sys
sys.path.insert(1, "path/to/files/")
from tests import testfile

testfile.function2(3, 4)
>>> 34   # good value

However, if I change then change testfile2 to the following:

import numba

@numba.njit(cache=True)
def function1(x):
    return x * 1

Then I restart the jupyter notebook kernel and rerun the notebook, I get the following

import sys
sys.path.insert(1, "path/to/files/")
from tests import testfile

testfile.function2(3, 4)
>>> 34   # bad value, should be 7

Importing both files into the notebook has no effect on the bad result. Also, setting cache=False only on function1 also has no effect. What does work is setting cache=False on all njit'ted functions, then restarting the kernel, then rerunning.

I believe that LLVM is probably inlining some of the called functions and then never checking them again.

I looked in the source and discovered there is a method that returns the cache object numba.caching.NullCache(), instantiated a cache object and ran the following:

cache = numba.caching.NullCache()
cache.flush()

Unfortunately that appears to have no effect.

Is there a numba environment setting, or another way I can manually clear all cached functions within a conda env? Or am I simply doing something wrong?

I am running numba 0.33 with Anaconda Python 3.6 on Mac OS X 10.12.3.

Greg Jennings
  • 1,611
  • 16
  • 25
  • 1
    Update: killing all of the files in the `__pycache__` directory numba uses *does* seem to work. Not sure if there is a better way, however. – Greg Jennings May 23 '17 at 10:40

3 Answers3

8

I "solved" this with a hack solution after seeing Josh's answer, by creating a utility in the project method to kill off the cache.

There is probably a better way, but this works. I'm leaving the question open in case someone has a less hacky way of doing this.

import os


def kill_files(folder):
    for the_file in os.listdir(folder):
        file_path = os.path.join(folder, the_file)
        try:
            if os.path.isfile(file_path):
                os.unlink(file_path)
        except Exception as e:
            print("failed on filepath: %s" % file_path)


def kill_numba_cache():

    root_folder = os.path.realpath(__file__ + "/../../")

    for root, dirnames, filenames in os.walk(root_folder):
        for dirname in dirnames:
            if dirname == "__pycache__":
                try:
                    kill_files(root + "/" + dirname)
                except Exception as e:
                    print("failed on %s", root)
Greg Jennings
  • 1,611
  • 16
  • 25
  • 1
    A couple years later, I haven't found a better way and I'm still using the above function without issue, so I'm marking it as the correct one. Anyone from the numba project who has a better way, let me know! – Greg Jennings Jul 10 '19 at 18:13
6

This is a bit of a hack, but it's something I've used before. If you put this function in the top-level of where your numba functions are (for this example, in testfile), it should recompile everything:

import inspect
import sys

def recompile_nb_code():
    this_module = sys.modules[__name__]
    module_members = inspect.getmembers(this_module)

    for member_name, member in module_members:
        if hasattr(member, 'recompile') and hasattr(member, 'inspect_llvm'):
            member.recompile()

and then call it from your jupyter notebook when you want to force a recompile. The caveat is that it only works on files in the module where this function is located and their dependencies. There might be another way to generalize it.

JoshAdel
  • 66,734
  • 27
  • 141
  • 140
  • Thanks Josh. It didn't quite work for me because I have a large project with a lot of files and packages. But upvoted, because I your this idea to solve it with another "hack" solution, which I put below. – Greg Jennings May 23 '17 at 12:17
0

The official document of Numba recommends to remove the cache directory for clearing the cache [link].

The Numba cache is saved in these four directories [link]:

  • numba.config.CACHE_DIR
  • __pycache__
  • numba.misc.appdirs.user_cache_dir()
  • IPython.paths.get_ipython_cache_dir()

It is possible to find the Numba cache by searching *.nbi files [link].


I personally use the following two codes. The former one is referenced from Python3 project remove __pycache__ folders and .pyc files.

import pathlib
import shutil

_ = [shutil.rmtree(p) for p in pathlib.Path('.').rglob('__pycache__')]
import IPython
import shutil

path_parent = IPython.paths.get_ipython_cache_dir()
path_child = os.path.join(path_parent, 'numba_cache')

if path_parent:
    if os.path.isdir(path_child):
        shutil.rmtree(path_child)
J. Choi
  • 1,616
  • 12
  • 23