Fastest way to get subset of numpy array in Cython

Question

I have a Cython function that takes a 2d nd.array (numpy array) of integers and returns a 1d numpy array whose length is the same as the input 2d array.

import numpy as np
cimport numpy as np

np.import_array()
cimport cython
def func(np.ndarray[np.float_t, dim=2] input_arr):
   cdef np.ndarray[np.float_t, ndim=1] new_arr = ...
   # do stuff
   return new_arr

In another loop in the program, I want to call func, but pass it a 2d array that is created dynamically from another 2d array. Right now I have:

my_2d_numpy_array = np.array([[0.5, 0.1], [0.1, 10]]) # assume this is defined
cdef int N = 10000
cdef int k
for j in xrange(N)
  # find some element k of interest
  # create a 2d array on fly containing just the k-th to func()
  func(np.array([my_2d_numpy_array[k]], dtype=float))  # KEY LINE

This works, but I think that the call to np.array each time inside the loop creates a huge overhead, because it goes back to Python. Since func only reads the array and doesn't modify it, how can I just pass it a view of the array as a pointer, without making a new array by going back to Python? I'm only interested in pulling out the kth row of my_2d_numpy_array and passing that to func()

Update: A related question: if I am using an nd.array inside the loop but don't need the full functionality of nd.array in func, can I make func instead take something like a static C array and somehow treat the nd.array as that? Will that save costs? Presumably then you don't have to pass an object to func (nd.array is an object)

Are you sure `np.array` goes back to the interpreter? It's a built-in function. — user2357112, Feb 24 '14 at 05:31
@user2357112: I think np.array goes back to Python? Not sure? added related question to this — , Feb 24 '14 at 05:38

score 5 · Accepted Answer · edited May 23 '17 at 12:00

You want to use Cython memory views. They are designed for passing array slices between functions that are a part of the same Cython module. You may need to inline the function within your Cython module to get the full performance benefit, but that isn't always necessary. You can take a look at the documentation. I recently wrote a rather lengthy answer to another question that looks in to when memory views should be used. If you want a more detailed examination of why slicing works well with memory views, have a look at this blog post.

If you don't use memory views, the slicing involving NumPy arrays still involves a Python call and is not performed in C.

For your specific case, here are a few thoughts: If you are passing array slices between functions in your Cython module you should be able to use a memory view to pass the slices. This approach does depend on compile-time optimizations, so if you need to pass an array between two functions that are compiled at separate times, you will have to use a pointer to pass data between functions. This will mean doing some careful pointer arithmetic, but it should still work. If you need to do slicing and use NumPy functions, you may just end up having to use NumPy arrays, but it could be worth trying to use NumPy arrays and memory views that view the same data. That way you will be able to pass slices as memory views, while only having to create NumPy arrays when you really need them.

Also, I would recommend making the function func a C-function so that you don't have to go through the overhead of calling a Python function when you call it. You can do that by using the cdef or cpdef keyword to declare it. Use cdef if you don't need to call it from outside the module. Use cpdef if you want a C function and a corresponding Python wrapper that is accessible to Python.

score 0 · Answer 2 · answered Feb 24 '14 at 05:32

0

func(my_2d_numpy_array[k:k+1])

Slicing my_2d_numpy_array instead of indexing it gets you the view you wanted with the shape you wanted.

answered Feb 24 '14 at 05:32

user2357112

260,549
28
431
505

1

Can you please comment on whether this is different from ``np.array``? I was worreid that using python style indexing might also invoke Python procedures of numpy indexing. I want to make sure my indexing is pure C – Feb 24 '14 at 13:27
@user248237dfsf: The slicing and indexing handling of a numpy array is done in pure C. – user2357112 Feb 24 '14 at 13:41
right but sometimes Cython code when not done right can make it so you're going to Python only to then call a numpy function which is in C, so I'm trying to avoid that. I know that numpy implements the indexing function in C – Feb 24 '14 at 14:53
3

Slicing a NumPy is *not* a C operation. That calls the same slicing method as it does in Python. – IanH Feb 27 '14 at 00:06

Fastest way to get subset of numpy array in Cython

2 Answers2