I am trying to replicate the linear programming solver that this person has made
http://www.idi.ntnu.no/~elster/master-studs/spampinato/spampinato-linear-prog-gpu-report.pdf.
The device I am using is Quadro FX 1800M with compute capability 1.2.
My problem is that when I launch more than 22 threads per block then most of the time I get inaccurate results (sometimes all zeros), however in unique cases I get accurate results when I launch even 512 threads per block.
Here are some test runs that I made. (Sequential Implies a CPU based Version) used for comparison
Iteration No 1 : of Sequential Version
Optimum Found 24.915583
Elapsed time: 0.001049725
Iteration No 1: of Parallel Version
BS-(Number of Threads) = : 20
Optimum found: 24.915583
Iteration No 2: of Parallel Version
BS-(Number of Threads) = : 256
Optimum found: 24.915607
Iteration No 3: of Parallel Version
BS-(Number of Threads) = : 512
Optimum found: 24.917068
Iteration No 4: of Parallel Version
BS-(Number of Threads) = : 2
Optimum found: 24.915583
Iteration No 5: of Parallel Version
BS-(Number of Threads) = : 456
Optimum found: -30693000299230806209574138333792043008.000000
Iteration No 6: of Parallel Version
BS-(Number of Threads) = : 456
Problem unsolvable: either qth==0 or loop too long.
Iteration No 7: of Parallel Version
BS-(Number of Threads) = : 512
Optimum found: 25.010513
Iteration No 8: of Parallel Version
BS-(Number of Threads) = : 256
Problem unsolvable: either qth==0 or loop too long.
Iteration No 9: of Parallel Version
BS-(Number of Threads) = : 256
Optimum found: 0.000000
Iteration No 10: of Parallel Version
BS-(Number of Threads) = : 512
Optimum found: 0.000000
Can somebody kindly point what I might be doing wrong, I know that I haven't posted the code but I am assuming that the code is correct as I am copying it from the research paper and the problem is on my end.
I should also point out that I am getting the following error when compiling the cuda code
ptxas /tmp/tmpxft_000017e7_00000000-10_culiblp.ptx, line 263; warning : Double is not supported. Demoting to float
Might this be a reason for the results?