I am trying to compute A^TA
using cuSparse. A
is a large but sparse matrix. The proper function to use based on the documentation is cusparseDcsrgemm2
. However, this is one of the few cuSparse operations that doesn't support an optional built-in transpose for the input matrix. There's a line in the documentation that said
Only the NN version is supported. For other modes, the user has to transpose A or B explicitly.
The problem is I couldn't find a function in cuSparse that can perform a transpose. I know I can transpose in CPU and copy it to the GPU but that will slow down the application. Am I missing something? What is the right way to use cuSparse to compute A^TA
?