How to track the "calling chain" from numpy to C implementation?

Question

I have read the tutorial and API guide of Numpy, and I learned how to extend Numpy with my own C code or how to use C to call Numpy function from this helpful documentation.

However, what I really want to know is: how could I track the calling chain from python code to C implementation? Or i.e. how could I know which part of its C implementation corresponds to this simple numpy array addition?

x = np.array([1, 2, 3])
y = np.array([1, 2, 3])
print(x + y)

Can I use some tools like gdb to track its stack frame step by step?

Or can I directly recognize the corresponding codes from variable naming policy? (like if I want to know the code about addition, I can search for something like function PyNumpyArrayAdd(...) )

[EDIT] I found a very useful video about how to point out the C implementation of these basic C-implemented function or operator overrides like "+" "-".

https://www.youtube.com/watch?v=mTWpBf1zewc

Got this from Andras Deak via Numpy mailing-list.

[EDIT2] There is another way to track all the functions called in Numpy using gdb. It's very heavy because it will display all the functions in Numpy that are called, including these trivial ones. And it might take some time.

First you need to download/clone the Numpy repository to your own working space and then compile it with -g option, which will attach debug informations for debugging.

Then you open a terminal in the "path/to/numpy-main" directory where the setup.py of Numpy lies, and then run gdb.

If you want to know what functions in Numpy's C implementation are called in this single python statement:

y = np.exp(x)

you can set breakpoints on all the functions implemented by Numpy using this gdb python script provided by the first answer here: Can gdb set break at every function inside a directory?

Once you load this python script by source somename.py, you can run this command in gdb: rbreak-dir numpy/core/src

And you can set commands for each breakpoint:

commands 1-5004
> silent
> bt 1
> c
> end

(here 1-5004 is the range of the breakpoints that you want to run commands on)

Once a breakpoint is activated, this command will run and print the first layer of backtrace (which is the info of the current function you are in) and then continue. In this way, you can track all the functions in Numpy, and this is a pic from my own working environment (I took a snapshot since there are rules preventing copying any byte from working computer):

Hope my trials can help the future comers.

score 3 · Accepted Answer · answered Sep 03 '22 at 10:02

However, what I really want to know is: how could I track the calling chain from python code to C implementation? Or i.e. how could I know which part of its C implementation corresponds to this simple numpy array addition?

AFAIK, there is two main way to do that: using a debugger or by tracking the function in the code (typically by looking the wrapping part or by searching keywords in numpy/core/src/XXX/). Numpy has different kind of functions. Some are focusing more on the CPython interaction part (eg. type checking, array creation, generic iterators, etc.) and some are focusing on the computing part (doing the computation efficiently). Regarding what you want, different files needs to be inspected. core/src/umath/loops.c.src is the way to go for core computing functions doing basic independent math operations.

Can I use some tools like gdb to track its stack frame step by step?

Using a debugger is the common way to do unless you are familiar with the code of Numpy. You can try to find the Numpy entry point function by looking the wrapper code but I think it is a bit difficult as this part of the code is not very readable (many related parts are generated certainly to ease the development of avoid mistakes). The hard part with GDB is to find the first entry point of the function in Numpy (the CPython interpreter function calls are hard to track as they are many of them (sometime called recursively) and the call stack is quite big far from being clear (ie. there is no clear information about the actual statement/expression being executed). That being said, AFAIR, the entry point is often something like PyArray_XXX or array_XXX. You can also track the first function executing code of the Numpy library.

Or can I directly recognize the corresponding codes from variable naming policy?

Some functions have a standardized name like typically PyArray_XXX. That being said, core computing function generally does not. They have a name generated by a template system that parse comments and annotations and generate code based on that. For adding two array, the main computing function should be for example @TYPE@_add@isa@ where @TYPE@ is either INT or LONG regarding your target platform. There is a special version (ie. specialization) for floating-point numbers that makes use of an optimized pair-wise summation so sake of accuracy. This kind of naming convention is quite frequent though so you can search _add in the code or a begin repeat section with add as a kind parameter.

How to track the "calling chain" from numpy to C implementation?

1 Answers1