I have a MATLAB mex library that loads a problem specific cubin file at runtime. This mex function gets called a few hundred times by MATLAB. Is the kernel reloaded each time by CUDA when I call cuModuleLoad? Or is it somehow cached? If not, is there a way to persist the loaded modules in between? I'm not currently calling cuModuleUnload.
It seems like the CUDA context is created only once for the MATLAB process since only the first call to the library is slow. Subsequent matlab function calls to the mex library are fast. So I guess I can assume that the same CUDA context is being reused.