I am currently working on improving the runtime for a simple Cython function to multiply a numpy matrix A
and a numpy vector x
using BLAS (i.e. runs A.dot.x
in normal numpy)
My current implementation matrix_multiply(A,x)
does this without copying the data:
import cython
import numpy as np
cimport numpy as np
cimport scipy.linalg.cython_blas as blas
DTYPE = np.float64
ctypedef np.float64_t DTYPE_T
@cython.boundscheck(False)
@cython.wraparound(False)
@cython.nonecheck(False)
def matrix_multiply(np.ndarray[DTYPE_T, ndim=2, mode="fortran"] A, np.ndarray[DTYPE_T, ndim=1, mode="fortran"] x):
#calls dgemv from BLAS which computes y = alpha * trans(A) + beta * y
#see: http://www.nag.com/numeric/fl/nagdoc_fl22/xhtml/F06/f06paf.xml
cdef int N = A.shape[0]
cdef int D = A.shape[1]
cdef int lda = N
cdef int incx = 1 #increments of x
cdef int incy = 1 #increments of y
cdef double alpha = 1.0
cdef double beta = 0.0
cdef np.ndarray[DTYPE_T, ndim=1, mode = "fortran"] y = np.empty(N, dtype = DTYPE)
blas.dgemv("N", &N, &D, &alpha, &A[0,0], &lda, &x[0], &incx, &beta, &y[0], &incy)
return y
I am wondering how I can change this so that it computes A(:,selected).dot.x
instead of A.dot.x
, where selected
is a set of ordered indices of the columns.
I am open to any implementation, though I suppose that the easiest way would be to change the function header to matrix_multiply(A,x,selected)
so it also expects selected
as an input. I believe that the answer has to use memory views, but I am not sure.