Cython: how to get the 'actual Python type' (type code/dtype) from C-level type

Question

I'd like to allocate stack memory for a memoryview defined with a ctypedef, and return it as numpy ndarray. This question discussed a few allocation methods, but the catch is I don't know how to programatically map my custom ctypedef to the corresponding numpy dtype or Python type code, which are needed for the allocation.

For example:

from cython cimport view
import numpy as np

ctypedef int value_type    # actual type subject to change

# np.empty requires me knowing that Cython int maps to np.int32
def test_return_np_array(size_t N):
    cdef value_type[:] b = np.empty(N, dtype=np.int32)
    b[0]=12                  # from ctypedef int ^
    return np.asarray(b)
# or, Cython memoryview requires the type code 'i'
def test_return_np_array(size_t N):
    cdef value_type[:] b = view.array(shape=(N,), itemsize=sizeof(int), format="i")
    b[0]=12                                                 # from ctypedef int ^
    return np.asarray(b)

I'm using the typedef so that I can flexibly change the actual data type (say from int to long long) without having to modify all the code.

In pure Python, type checking is easy:

value_type = int
print(value_type is int)    # True
print(value_type is float)  # False

In numpy this can also be easily achieved by parameterizing the dtype as a string, like value_type="int32" then calling np.empty(N, dtype=value_type). With my ctypedef, Cython won't compile np.empty(N, dtype=value_type), and complains "'value_type' is not a constant, variable or function identifier". Is it possible to achieve something like this at compile time?

The user shouldn't have to manage the returned memory, so malloc won't be an option.

I came up with a hack using C++ vector: <value_type[:N]>vector[value_type](N).data(), but this seems to cause memory errors.

My workaround is to manually define both a runtime type and corresponding compile-time type, like `value_type = np.int32` and `ctypedef np.int32_t value_type_t` (which can be easily changed to `value_type = np.int64` and `ctypedef np.int64_t value_type_t`), based on the mapping in `/Cython/Includes/numpy/__init__.pxd`. It's also mentioned [here](http://docs.cython.org/en/latest/src/tutorial/numpy.html#adding-types). — Yibo Yang, Jun 22 '18 at 15:50
You are right this is the simplest solution to start with. I've updated my answer with some considerations, which explains why a more robust approach might be worth the trouble. — ead, Jun 23 '18 at 12:20

ead · Answer 1 · 2021-05-06T06:13:26.280

From C's point of view np.int32 is not a type but a Python-object, which must be created at run time and cannot be created at compile time.

Probably the most robust way is this trick (for an explanation of details see this SO-question):

%%cython -a 

import numpy as np

def GET_SIGNED_NUMPY_TYPE():
    cdef int tmp
    return np.asarray(<int[:1]>(&tmp)).dtype

and now

>>> print(GET_SIGNED_NUMPY_TYPE())
int32

The advantage is, that Cython-infrastructure is used for the mapping and no manual error-prone work is needed.

A less enigmatic but also more error-prone approach: you could pick the right type through a function which is called at run time, when the module is loaded:

%%cython
import numpy as np

ctypedef int value_type 

SIGNED_NUMPY_TYPE_MAP = {2 : np.int16, 4 : np.int32, 8 : np.int64}
SIGNED_NUMPY_TYPE = SIGNED_NUMPY_TYPE_MAP[sizeof(value_type)]

def zeros(N):
    return np.zeros(N, dtype=SIGNED_NUMPY_TYPE)

and now:

>>> print(zeros(1).dtype)
int32

Changing int to long long would lead to np.int64 being picked.

A similar approach could also be used for memory-views.

As you have pointed out, the Cython tutorial recommends to map the types manually, e.g.:

ctypedef np.int32_t value_type
SIGNED_NUMPY_TYPE = np.int32

and then change both manually, if needed. This simple solution is probably the best for smaller programs and prototypes. However, there are some consideration which could call for a more robust approach:

When both defines are placed next to each other, it is easy to see that they have to be changed together. For more complex program the two definitions can placed in different pxd or pyx files, and then it is only a question of time until this breaks.
As long as fixed size types (int32, int64) are used, the corresponding numpy type is obvious. However for types like int and long it is not easy to tell:
- int is only guaranteed to have at least 2 and not more bytes than long. The compiler can decide which size is picked, It might be slightly worrying that there are no guarantee, however the usual suspects (gcc, cland, icc and msvc) choose 4 bytes for the usual architectures.
- long is already a pitfall: gcc choose it to be 8 bytes for Linux64, but in msvc long is only 4 bytes long so without knowing which compiler will be used, one cannot choose between np.int32 and np.int64 in advance.
- For the case of long there is np.int which is quite confusing, because one would expect np.int to map to int and not long! However on Linux64/gcc np.int.itemsize is 8 bytes, but int is only 4 bytes long. On the other hand on Windows64/msvc both np.int and int are 4 bytes.

Cython: how to get the 'actual Python type' (type code/dtype) from C-level type

1 Answers1