2

Is there a cython-ic way to set a cdef array to zeros. I have a function with the following signature:

cdef cget_values(double[:] cpc_x, double[:] cpc_y):

The function is called as follows:

cdef double cpc_x [16]
cdef double cpc_y [16]
cget_values(cpc_x, cpc_y)

Now the first thing I would like to do is set everything in these arrays to zeros. Currently, I am doing that with a for loop as:

for i in range(16):
    cpc_x[i] = 0.0
    cpc_y[i] = 0.0

I was wondering if this is a reasonable approach without much overhead. I call this function a lot and was wondering if there is a more elegant/faster way to do this in cython.

Luca
  • 10,458
  • 24
  • 107
  • 234
  • If you turn off wraparound and boundschecking this is about as fast as you can make things. That said usually allocating arrays isn’t the slow part. Rather than making use of pointer arrays (which either require allocating data on the stack and risking a stack overflow or using malloc/tree and dealing with manual memory mansgement. Better to just use np.zeros and let numpy handle managing the memory. – ngoldbaum Apr 30 '18 at 00:55

1 Answers1

2

I assume, you are already using @cython.boundscheck(False), so there is not much you can do to improve on it performance-wise.

For the readability reasons I would use:

cpc_x[:]=0.0
cpc_y[:]=0.0

the cython would translate this to for-loops. An other additional advantage: even if @cython.boundscheck(False) isn't used, the resulting C-code will be nonetheless without boundchecks (__Pyx_RaiseBufferIndexError). Here is the resulting code for a[:]=0.0:

  {
      double __pyx_temp_scalar = 0.0;
      {
          Py_ssize_t __pyx_temp_extent_0 = __pyx_v_a.shape[0];
          Py_ssize_t __pyx_temp_stride_0 = __pyx_v_a.strides[0];
          char *__pyx_temp_pointer_0;
          Py_ssize_t __pyx_temp_idx_0;
          __pyx_temp_pointer_0 = __pyx_v_a.data;
          for (__pyx_temp_idx_0 = 0; __pyx_temp_idx_0 < __pyx_temp_extent_0; __pyx_temp_idx_0++) {
            *((double *) __pyx_temp_pointer_0) = __pyx_temp_scalar;
            __pyx_temp_pointer_0 += __pyx_temp_stride_0;
          }
      }
  }

What could improve the performance is to declare the the memory views to be continuous (i.e. double[::1] instead of double[:]. The resulting C code for a[:]=0.0 would be then:

  {
      double __pyx_temp_scalar = 0.0;
      {
          Py_ssize_t __pyx_temp_extent = __pyx_v_a.shape[0];
          Py_ssize_t __pyx_temp_idx;
          double *__pyx_temp_pointer = (double *) __pyx_v_a.data;
          for (__pyx_temp_idx = 0; __pyx_temp_idx < __pyx_temp_extent; __pyx_temp_idx++) {
            *((double *) __pyx_temp_pointer) = __pyx_temp_scalar;
            __pyx_temp_pointer += 1;
          }
      }
  }

As one can see, strides[0] is no longer used in the continuous version - strides[0]=1 is evaluated during the compilation and the resulting C-code can be better optimized (see for example here).


One could be tempted to get smart and to use low-level memset-function:

from libc.string cimport memset
memset(&cpc_x[0], 0, 16*sizeof(double))

However, for bigger arrays there will no difference compared to the usage of continuous memory view (i.e. double[::1], see here for example). There might be less overhead for smaller sizes, but I never cared enough to check.

ead
  • 32,758
  • 6
  • 90
  • 153
  • Thanks! I ended up using the flattened notation. Cython is awesome. I cannot believe I waited so long to use it. I made my app 20x faster without much effort! – Luca Apr 30 '18 at 06:55