-2

I am trying to implement a iterative linear solver named "Conjugate Gradient Solver" in CUDA which solves equation of form,

A*x=b,

where A is sparse symmetric positive definite matrix of size nXn, x is unknown vector of size n with initial guess as 0 and b is a vector of size n on right hand side of the equation.

There are many operations included in my code like Sparse Matrix-vector multiplication,vector-vector operations.

My code works fine with matrix size upto 31 X 31,but not more than 31 X 31. It may be because of the number of threads allocated to a kernel function. I am allocating threads as

mul<<<1,nrows>>>()

Here mul is a function used to perform Sparse matrix-vector multiplication and nrows is the number of rows in a sparse matrix,A.

Is this problem related to 1 wrap size=32 threads ?

If anyone knows,please suggest me.

Thank you..!!

Slava Vedenin
  • 58,326
  • 13
  • 40
  • 59
PujaK
  • 1
  • 1
  • 2
    Without any code this question is impossible to answer without speculation. Please include a [mcve]. – havogt Mar 22 '16 at 09:50

1 Answers1

-2

Try to run the "devicequery" program from the NVIDIA CUDA Samples to get warp size present in your installation. If its shows warp size=32 then your problem may be related to it else specific code snippet is mandatory to give any solution.

yogesh_desai
  • 449
  • 9
  • 21