Cython calling MKL crashing on elementwise multiplication via vdMul

Question

I can't figure out why this crashes when called from Python. It is just a simple Cython code to call Intel MKL's vdMul function https://software.intel.com/en-us/mkl-developer-reference-c-v-mul. I've tried copying every DLL from MKL into the directory and rewriting different parts but it keeps crashing although compiles fine. Posting here as I probably made an obvious error to someone more experienced working in C++. Here's the PYX code:

import numpy as np
cimport numpy as np
cimport cython
from cython cimport view

cdef extern from "mkl.h" nogil:
    double* vect_mult "vdMul"(int n,
                          double *a, 
                          double *b,
                          double *y) 

@cython.boundscheck(False)
@cython.wraparound(False)
@cython.cdivision(True)
cpdef mult(double[::1] A, double[::1] B, double[:,::1] output):
    cdef int Ashape0=A.shape[0], Bshape0=B.shape[0]
    cdef int N = Ashape0*Bshape0
    with nogil:
        vect_mult(N, &A[0], &B[0], &output[0,0])

#test script
from cyblas import mult
import numpy as np
a=np.random.randn(1000)
b=np.random.randn(1000)
output = np.zeros((a.shape[0],b.shape[0]))
mult(a,b,output)

ead · Accepted Answer · 2017-07-05T13:52:38.647

1

I'm not sure, what do you want to do. As I understand the meaning of vdMult: its result is a n-dimensional vector with out[i]=a[i]*b[i]. So

output should be a flat array of size 1000.
by passing Ashape0*Bshape0 instead of min(Ashape0,Bshape0) you get a segmentation fault as the program tries to access arrays out of bounds.

You code should look something like this:

cpdef mult(double[::1] A, double[::1] B, double[::1] output):
    cdef int N = A.shape[0]#assuming all vectors have the same size
    with nogil:
        vect_mult(N, &A[0], &B[0], &output[0,0])

Edit: vdMult performs a point-wise multiplication. I assume what you want to do, is to calculate out=a*b^t, that is out[i][j]=a[i]*b[j].

So it is an usual matrix multiplication and you could use cblas_dgemm. In your case the call would be (n - the number of elements in vector a, m - the number of elements in b):

cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, 
                n, m, 1, 1.0, A, 1, B, m, 0.0, C, m);

edited Jul 05 '17 at 13:52

answered Jul 04 '17 at 18:35

ead

32,758
6
90
153

I passed max(Ashape0, Bshape0) and it worked fine. Although since it is vector multiplication then it should just be either Ashape0 or Bshape0 since they should be equal. MKL is still 2x slower than NumPy, no idea why... but thank you! – Matt Jul 04 '17 at 20:53
To clarify the function (at least I thought) was element wise multiplication although this is producing otherwise – Matt Jul 04 '17 at 21:38
Thanks do you know what MKL function does element wise multiplication of vectors? Guess by your response I was unintentionally using the wrong one. – Matt Jul 05 '17 at 13:12
@Matt not sure, what you mean by element wise multiplication, but please look at my edit – ead Jul 05 '17 at 13:53
Thanks that is the library I also found that looked appropriate, thanks for confirming. – Matt Jul 05 '17 at 14:06
You are a genius! I didn't read your `dgemm` implementation for vectors which has been driving me crazy. Please post the above here so I can accept another answer from you! https://stackoverflow.com/questions/44980665/using-the-scipy-cython-blas-interface-from-cython-not-working-on-vectors-mx1-1xn for that answer it is a slight modification: if k==1: dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, n, m, 1, 1.0, a0, 1, b0, m, 0.0, c0, m) else: dgemm(CblasColMajor, transa, transb, m, n, k, alpha, b0, lda, a0, ldb, beta, c0, ldc) – Matt Jul 19 '17 at 23:42
Sorry the link I sent you to was the scipy based version - my MKL based version worked like a charm with your suggestion. – Matt Jul 20 '17 at 21:05

Cython calling MKL crashing on elementwise multiplication via vdMul

1 Answers1