1

Question: Looking for someone that is more familiar with Numba AOT and any "output intermediate files" option I haven't found yet.

First off -> Pythran is a Python to C++ to PYD compiler. You can output the cpp files with a simple -e option to the compiler. It also takes #pragma omp flags like #omp parallel for. Now the more talked about Numba is used in JIT mode mostly. But it has an option to AOT (ahead of time) compile modules, but they lose parallel optimizations.

So I am using both of these libraries in a project. The Numba AOT compiled function is much faster than the Pythran one (and the Pythran one is using #omp parallel for loops while Numba AOT doesn't). Nonetheless, the Numba version is faster, so it's doing something my Pythran program is not.

So I want to see what it's doing - but when I look at the source code for Numba's AOT from numba.pycc import CC it appears Numba somehow is actually generating byte code and then compiles that into a PYD. But the documentation doesn't state this is being done anywhere, or if it's even possible to get the preliminary files Numba generates prior to compiling to examine. If they are in bytecode, well, I can't read that anyway. BUT if they are in either Cython or CPP format, then it's easy to examine the optimizations being done. SO...

Looking for someone that is more familiar with Numba AOT and any "output intermediate files" option I haven't found yet.

Even if the answer is "NO YOU CAN'T DO THAT" I need to hear it, then I can focus on parts of the code I can actually change.

Appreciated!

Matt
  • 2,602
  • 13
  • 36
  • Basically Numba is a LLVM Frontend for a subset of Python code as Clang is a Frontend for C/C++. Both are generating LLVM-IR which is compiled to machine code using LLVM. There is no C++ representation in between. https://numba.readthedocs.io/en/stable/developer/architecture.html Sometimes it makes sense to write a code snipped in C and in Python and compare the optimized LLVM-IR. Please note: If you using JIT it's by default -O3 -march=native, while AOT is not march=native. Also consider using caching (cache=True) instead of AOT. – max9111 Feb 26 '22 at 19:02
  • Yes when I compile to AOT with Numba I lower the optimization level to Haswell which is a 2014 processor since I don't know the user's PC, to ensure compatibility. JIT is quite fast but the initial compiling is slow, hence the AOT method. I do use cache=True, but the first run is always slow. Good advice though this small function can easily be written in Cython as it's just a dual loop on a memoryview, so it should be pretty brainless. I have to look up LLVM-IR though. – Matt Feb 26 '22 at 19:58

0 Answers0