I want to build a cython class with a variable number of memoryviews. I want to be able to shuffle the memoryviews and reorder them without copying large amounts of data. I want to keep the benefits of determining at compile type the base type of the arrays (double).
I have not managed to do this efficiently so far.
Here is baseline class definition:
import numpy as np
cimport numpy as np
cdef class Container:
cdef double[:] ary
def __cinit__(self, long n):
self.ary = np.zeros(n, dtype = np.float)
cpdef void fill(self, double v):
cdef long i
for i in range(len(self.ary)):
self.ary[i] = i+v
When I run a speed benchmark as follows:
a = Container(1000)
%timeit a.fill(10)
I get 550 ns per run of fill
.
Now I want to have several memoryviews inside my class, so I tried instead:
import numpy as np
cimport numpy as np
cdef class Container2:
cdef object[:] ary
def __cinit__(self, long n, long m):
cdef Py_ssize_t i
self.ary = np.empty(n, dtype = np.object)
for i in range(n):
self.ary[i] = np.zeros(m, dtype=np.float)
cpdef void fill(self, double v):
cdef long i, j
cdef double[:] sl
for i in range(len(self.ary)):
sl= self.ary[i]
for j in range(len(self.ary[i])):
sl[j] = j+v
Container2
is the same as Container
, except it holds a variable number of typed memory views instead of just one.
When I run a speed benchmark again using
b = Container2(10, 1000)
%timeit b.fill(10)
I now get 9.5 µs per run, so to compare with the above, this translates into 950 ns per run on one array of size 1000.
So the processing time has doubled.
Surely there must be a better and more efficient way.