I've seen a similar post on stackoverflow which tackles the problem in C++: Parallel implementation for multiple SVDs using CUDA I want to do exactly the same in python, is that possible? I have multiple matrices (approximately 8000 with size 15x3) and each of them I want to decompose using the SVD. This takes years on a CPU. Is it possible to do that in python? My computer has an NVIDIA GPU installed. I already had a look at several libraries such as numba, pycuda, scikit-cuda, cupy but didnt found a way to implement my plan with that libraries. I would be very glad for some help.
Asked
Active
Viewed 338 times
1 Answers
2
cuPy gives access to cuSolver, including a batched SVD:
https://docs.cupy.dev/en/stable/reference/generated/cupy.linalg.svd.html

talonmies
- 70,661
- 34
- 192
- 269

Stripedbass
- 194
- 1
- 8
-
Yepp I have seen that one, but this only make sense for big matrices. My problem is that I want that each thread is doing such an SVD and if I am not wrong cuPy does not allow me to do so in python. (Control each thread via its ID) – horsti Oct 30 '20 at 13:41
-
I haven’t tried it in cuPy, but in numpy the input array can be > 2 dimensions ( 8000 x 15 x 3 in this case) and it broadcasts. In theory, cuPy does similar? https://numpy.org/doc/stable/reference/routines.linalg.html#routines-linalg-broadcasting – Stripedbass Oct 30 '20 at 23:16