Could calls to lapack routines in CUDA (CULA or MAGMA) be instantiated within a CUDA kernel and not from the Host? (__device functions and not __global functions) If it is not possible, how to therefore implement (_device) linear algebra routines in CUDA? My goal is to run in parallel in CUDA some Lapack functions (sgesvd, sgesv..) and the calls in my application have to be made from device and not from host.
Asked
Active
Viewed 75 times
0
-
What about MaGMA? and is there a possibility to find linear algebra routines implementations (SVD..) written in CUDA C and callable from the device? – Didon Mar 01 '15 at 21:00
-
1You cannot use CULA or MAGMA directly from GPU device code. AFAIK there is no well-known LAPACK-like library that is callable from GPU device code. The only linear algebra routines callable from device code AFAIK are in the CUBLAS library. "is there a possibility to find linear algebra routines implementations (SVD..) written in CUDA C and callable from the device" ? Is a very broad question. Asking for references to offsite resources is off-topic for SO. – Robert Crovella Mar 01 '15 at 21:36
-
cuBLAS routines, which include routines for calculating the LU decomposition, can be called from within device. The SVD provided by the cuSOLVER library cannot be called from within the device. – Vitality Mar 01 '15 at 21:41
-
Alright; but cublas is BLAS for CUDA including matrix vector operations and matrix matrix operations, The application that I want to impelement makes use of svd, system solving by LU factorisation, and eig decomposition..From what you just said, I understood that, the only way to use these functions in parallel on the device, is to implement them in C and use them in parallel threads – Didon Mar 01 '15 at 21:46
-
@JackOlantern No library therefore provides an SVD decomposition from within the device. is it possible to use the C implementation of SVD from numerical recipes and LU decomposition from cublas within the same kernel? – Didon Mar 01 '15 at 21:50
-
Concerning the SVD, my personal experience is that if you recycle sequential code on the device to calculate many SVDs in parallel, you will not be able to outperform any smart sequential implementation. – Vitality Mar 01 '15 at 21:56