I have a function that I'd like to use Cython with that involves processing large numbers of fixed-length strings. For a standard cython function, I can declare the types of arrays like so:
cpdef double[:] g(double[:] in_arr):
cdef double[:] out_arr = np.zeros(in_arr.shape, dtype='float64')
cdef i
for i in range(len(in_arr)):
out_arr[i] = in_arr[i]
return out_arr
This compiles and works as expected when the dtype is something simple like int32
, float
, double
, etc. However, I cannot figure out how to create a typed memoryview of fixed-length strings - i.e. the equivalent of np.dtype('a5')
, for example.
If I use this:
cpdef str[:] f(str[:] in_arr):
# arr should be a numpy array of 5-character strings
cdef str[:] out_arr = np.zeros(in_arr.shape, dtype='a5')
cdef i
for i in range(len(in_arr)):
out_arr[i] = in_arr[i]
return out_arr
The function compiles, but this:
in_arr = np.array(['12345', '67890', '22343'], dtype='a5')
f(in_arr)
Throws the following error:
---> 16 cpdef str[:] f(str[:] in_arr): 17 # arr should be a numpy array of 5-character strings 18 cdef str[:] out_arr = np.zeros(in_arr.shape, dtype='a5')
ValueError: Buffer dtype mismatch, expected 'unicode object' but got a string
Similarly if I use bytes[:]
, it gives the error "Buffer dtype mismatch, expected 'bytes object' but got a string" - and this doesn't even get to the issue with the fact that nowhere am I specifying that these strings have length 6.
Interestingly, I can include fixed-length strings in a structured type as in this question, but I don't think that's the right way to declare the types.