I need to pass a scipy.sparse CSR matrix to a cython function. How do I specify the type, as one would for a numpy array?

- 56,802
- 26
- 179
- 234

- 10,685
- 16
- 63
- 101
-
do you want to quickly access the values of this matrix? perhaps it is better to pass a `coo_matrix` and use the properties `row`, `col` and `data` to access only the non-zero values, using 1-D array buffers in Cython to give you the fastest access... – Saullo G. P. Castro Aug 13 '14 at 20:28
-
that's precisely what i want. Could you please give me an example of how to do this. – vgoklani Aug 13 '14 at 20:34
-
According the `csr` documentation, the component arrays are `data, indices, indptr`. The `coo` arrays, `data, row, col` are easier to understand, but the `csr` ones are preferred for most math operations. – hpaulj Aug 13 '14 at 22:17
3 Answers
Here is an example about how to quickly access the data from a coo_matrix
using the properties row
, col
and data
. The purpose of the example is just to show how to declare the data types and create the buffers (also adding the compiler directives that will usually give you a considerable boost)...
#cython: boundscheck=False
#cython: wraparound=False
#cython: cdivision=True
#cython: nonecheck=False
import numpy as np
from scipy.sparse import coo_matrix
cimport numpy as np
ctypedef np.int32_t cINT32
ctypedef np.double_t cDOUBLE
def print_sparse(m):
cdef np.ndarray[cINT, ndim=1] row, col
cdef np.ndarray[cDOUBLE, ndim=1] data
cdef int i
if not isinstance(m, coo_matrix):
m = coo_matrix(m)
row = m.row.astype(np.int32)
col = m.col.astype(np.int32)
data = m.data.astype(np.float64)
for i in range(np.shape(data)[0]):
print row[i], col[i], data[i]

- 56,802
- 26
- 179
- 234
Building on @SaulloCastro's answer, add this function to the .pyx
file to display the attributes of a csr
matrix:
def print_csr(m):
cdef np.ndarray[cINT32, ndim=1] indices, indptr
cdef np.ndarray[cDOUBLE, ndim=1] data
cdef int i
if not isinstance(m, csr_matrix):
m = csr_matrix(m)
indices = m.indices.astype(np.int32)
indptr = m.indptr.astype(np.int32)
data = m.data.astype(np.float64)
print indptr
for i in range(np.shape(data)[0]):
print indices[i], data[i]
indptr
does not have the same length as data
, so can't be printed in the same loop.
To display the csr
data like coo
, you can do your own conversion with these iteration lines:
for i in range(np.shape(indptr)[0]-1):
for j in range(indptr[i], indptr[i+1]):
print i, indices[j], data[j]
I assume you know how to setup and compile a pyx
file.
Also, what does your cython
function assume about the matrix? Does it know about the csr
format? The coo
format?
Or does your cython
function want a regular numpy
array? In that case, we are off on a rabbit trail. You just need to convert the sparse matrix to an array: x.toarray()
(or x.A
for short).

- 56,802
- 26
- 179
- 234

- 221,503
- 14
- 230
- 353
If you want to access the data directly (without copy) you need to specify the type in the function argument:
import numpy as np
cimport numpy as np
#cython: boundscheck=False
#cython: wraparound=False
def some_cython_func(np.ndarray[np.double_t] data, np.ndarray[int] indices, np.ndarray[int] indptr):
#body of of the function
Then you may call this function using
some_cython_func(M.data, M.indices, M.indptr)
where M
is your CSR
or CSC
function.
See this page for an explanation of passing argument without casting.

- 274
- 4
- 13
-
exactly, so what type should I use for a scipy.sparse matrix? The type you defined in your example is a numpy matrix. – vgoklani Aug 21 '14 at 23:23
-
The CSR matrix is represented using three numpy arrays (data, indices and indptr) that are accessible. Here we are passing those arrays to our cython function as a representative of the CSR matrix. You can efficiently reconstruct the CSR matrix inside the cython function from these arrays. – SKV Aug 22 '14 at 16:42