I'm trying to find matrix inverse using gauss jordan elimination in cuda and here is my kernel code to convert a given n*n matrix to diagonal matrix:
__global__ void gaussjordan(float *A, float *I,int n)
{
int x = blockIdx.x * blockDim.x + threadIdx.x;
int y = blockIdx.y * blockDim.y + threadIdx.y;
float P;
if(x!=y)
{
P=A[x*n+y]/A[y*n+y];
for(int k=0;k<n;k++){
I[x*n+k]-=I[y*n+k]*P;
A[x*n+k]=A[x*n+k]-A[y*n+k]*P;
}
__syncthreads();
}
}
the problem here is the matrix A[] is not getting updated with the modified values and the threads are taking the initial values.
for example if n=3 consider A[n*n]=[1 2 2 2 2 2 2 2 3] so after thread iterations x=1;y=0 and x=2;y=0 A[] becomes [1 2 2 0 -2 -2 0 -2 -1] so for the next thread iterations x=0;y=1 and x=2;y=1 the threads should use the modified A[] values but instead the threads are using the original A values. can anyone help me figure out how to update the vales so the threads can use modified values.