If have a matrix and I only want to access to the lower triangular part of the matrix. I am trying to find a good thread index but so far I have not managed it. Any ideas? I need and index to loop over the lower triangular matrix, say this is my matrix
1 2 3 4
5 6 7 8
9 0 1 2
3 5 6 7
the index should go for
1
5 6
9 0 1
3 5 6 7
in this example, positions 0,4,5,8,9,10,12,13,14,15 of a 1D array.
The CPU loop is:
for(i = 0; i < N; i++){
for(j = 0; j <= i; j++){
.......
where N is the number of rows. I was trying something in the kernel:
__global__ void Kernel(int N) {
int row = blockIdx.x * blockDim.x + threadIdx.x;
int col = blockIdx.y * blockDim.y + threadIdx.y;
if((row < N) && (col<=row) )
printf("%d\n", row+col);
}
and then call it this way:
dim3 Blocks(1,1);
dim3 Threads(N,N);
Kernel<<< Blocks, Threads>>>(N);
but it doesn't work at all. What I get:
0
1
2
2
3
4