1

I would use the SVD routine of CUDA 7.0 (cuSolver), i need to perform the SVD on all parts where i split the matrix (for example, dividing the matrix into 2x2 blocks, I want to perform four times the SVD in parallel) . The idea would be to invoke several times the kernel in relation to the matrix subdivision. so:

for loop(istart){
   for loop(jstart){
       "invoke kernel"
   }
}

But in this way the call to the kernel is serial, not parallel. Since there isn't the possibility to invoke these functions from the kernel, how can I parallelise these calls?

Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
sim186
  • 39
  • 3
  • 10

0 Answers0