5

I need to pass a scipy.sparse CSR matrix to a cython function. How do I specify the type, as one would for a numpy array?

Saullo G. P. Castro
  • 56,802
  • 26
  • 179
  • 234
vgoklani
  • 10,685
  • 16
  • 63
  • 101
  • do you want to quickly access the values of this matrix? perhaps it is better to pass a `coo_matrix` and use the properties `row`, `col` and `data` to access only the non-zero values, using 1-D array buffers in Cython to give you the fastest access... – Saullo G. P. Castro Aug 13 '14 at 20:28
  • that's precisely what i want. Could you please give me an example of how to do this. – vgoklani Aug 13 '14 at 20:34
  • According the `csr` documentation, the component arrays are `data, indices, indptr`. The `coo` arrays, `data, row, col` are easier to understand, but the `csr` ones are preferred for most math operations. – hpaulj Aug 13 '14 at 22:17

3 Answers3

6

Here is an example about how to quickly access the data from a coo_matrix using the properties row, col and data. The purpose of the example is just to show how to declare the data types and create the buffers (also adding the compiler directives that will usually give you a considerable boost)...

#cython: boundscheck=False
#cython: wraparound=False
#cython: cdivision=True
#cython: nonecheck=False

import numpy as np
from scipy.sparse import coo_matrix
cimport numpy as np

ctypedef np.int32_t cINT32
ctypedef np.double_t cDOUBLE

def print_sparse(m):
    cdef np.ndarray[cINT, ndim=1] row, col
    cdef np.ndarray[cDOUBLE, ndim=1] data
    cdef int i
    if not isinstance(m, coo_matrix):
        m = coo_matrix(m)
    row = m.row.astype(np.int32)
    col = m.col.astype(np.int32)
    data = m.data.astype(np.float64)
    for i in range(np.shape(data)[0]):
        print row[i], col[i], data[i]
Saullo G. P. Castro
  • 56,802
  • 26
  • 179
  • 234
3

Building on @SaulloCastro's answer, add this function to the .pyx file to display the attributes of a csr matrix:

def print_csr(m):
    cdef np.ndarray[cINT32, ndim=1] indices, indptr
    cdef np.ndarray[cDOUBLE, ndim=1] data
    cdef int i
    if not isinstance(m, csr_matrix):
        m = csr_matrix(m)
    indices = m.indices.astype(np.int32)
    indptr = m.indptr.astype(np.int32)
    data = m.data.astype(np.float64)
    print indptr
    for i in range(np.shape(data)[0]):
        print indices[i], data[i]

indptr does not have the same length as data, so can't be printed in the same loop.

To display the csr data like coo, you can do your own conversion with these iteration lines:

    for i in range(np.shape(indptr)[0]-1):
        for j in range(indptr[i], indptr[i+1]):
            print i, indices[j], data[j]

I assume you know how to setup and compile a pyx file.

Also, what does your cython function assume about the matrix? Does it know about the csr format? The coo format?

Or does your cython function want a regular numpy array? In that case, we are off on a rabbit trail. You just need to convert the sparse matrix to an array: x.toarray() (or x.A for short).

Saullo G. P. Castro
  • 56,802
  • 26
  • 179
  • 234
hpaulj
  • 221,503
  • 14
  • 230
  • 353
2

If you want to access the data directly (without copy) you need to specify the type in the function argument:

import numpy as np
cimport numpy as np

#cython: boundscheck=False
#cython: wraparound=False
def some_cython_func(np.ndarray[np.double_t] data, np.ndarray[int] indices, np.ndarray[int] indptr):
    #body of of the function

Then you may call this function using

some_cython_func(M.data, M.indices, M.indptr)

where M is your CSR or CSC function.

See this page for an explanation of passing argument without casting.

SKV
  • 274
  • 4
  • 13
  • exactly, so what type should I use for a scipy.sparse matrix? The type you defined in your example is a numpy matrix. – vgoklani Aug 21 '14 at 23:23
  • The CSR matrix is represented using three numpy arrays (data, indices and indptr) that are accessible. Here we are passing those arrays to our cython function as a representative of the CSR matrix. You can efficiently reconstruct the CSR matrix inside the cython function from these arrays. – SKV Aug 22 '14 at 16:42