CUDA thread allocation

Question

I am trying to implement a iterative linear solver named "Conjugate Gradient Solver" in CUDA which solves equation of form,

A*x=b,

where A is sparse symmetric positive definite matrix of size nXn, x is unknown vector of size n with initial guess as 0 and b is a vector of size n on right hand side of the equation.

There are many operations included in my code like Sparse Matrix-vector multiplication,vector-vector operations.

My code works fine with matrix size upto 31 X 31,but not more than 31 X 31. It may be because of the number of threads allocated to a kernel function. I am allocating threads as

mul<<<1,nrows>>>()

Here mul is a function used to perform Sparse matrix-vector multiplication and nrows is the number of rows in a sparse matrix,A.

Is this problem related to 1 wrap size=32 threads ?

If anyone knows,please suggest me.

Thank you..!!

Without any code this question is impossible to answer without speculation. Please include a [mcve]. — havogt, Mar 22 '16 at 09:50

score -2 · Answer 1 · answered Mar 22 '16 at 13:19

-2

Try to run the "devicequery" program from the NVIDIA CUDA Samples to get warp size present in your installation. If its shows warp size=32 then your problem may be related to it else specific code snippet is mandatory to give any solution.

answered Mar 22 '16 at 13:19

yogesh_desai

449
9
21

CUDA thread allocation

1 Answers1