I am trying to write a run length encoding arithmetic library for Python in Cython. Below you see how the declaration and the important parts of the hot loop of the algorithm looks. It has two places with much and moderate Python interaction, line 73-74 and 77. The C code generated for the heavy Python interaction part is shown in a picture at the end. I will only inquire about how to solve 73-74 here as I imagine the fix for 77 will be similar.
As you can see, there is 1) a lot of type casting in the generated C code, 2) it uses richcompare and 3) getitemint. I do not understand why: 1) The types should be that same, 2) the comparison should be possible at a C level as they are just comparing the same types of numbers and 3) getitem should be superfluous as you are just looking up an index in C array.
How can fix this to optimize my code? Is the problem that the numpy array declarations create Python objects and that I need to give a pointer to them in some way?
Here you see the C code Cython generated for the two dark and light yellow places in my hot loop: