I want to call the sparse matrix multiplication function in cuSPARSE library inside the kernel instead of directly calling it at the host side. I write a __device__
function to implement it.My CUDA is 11.3 and My hardware is V100.My code all follows NVIDIA CUDALibrarySamples:
spmm_csr
But it fails with:
error: calling a __host__ function("cusparseSpMM") from a __device__ function("spmm_csr") is not allowed
How can I call it in __device__
function?
Or there are others ways to implement sparse matrix multiplication inside kernel?