I am working on a matrix-transpose in-place-implementation, with the following idea: Let the indices only take values that correspond to upper triangular entries (since diagonal ones stay constant during the transformation) and exchange values A[i,j] and A[j,i] within one thread via a temporary integer.
I am having trouble to assign the correct next index pair A[i_next,j_next] to the threat currently working on A[i,j] (this corresponds to the last two lines in the code snippet). How do I have to modify these two lines in order to access only the correct matrix entries for each threat?
Example:
for N=5; total-threads=3;
This is a 5x5 matrix, where the entries hold the thread_id of the threat they shall be accessed by. Those belonging to the first thread are italic as well (just to show which indexes I want row_idx, col_idx to assume for that specific thread.
Col0 | Col1 | Col2 | Col3 | Col4 |
---|---|---|---|---|
/ | 1 | 2 | 3 | 1 |
/ | / | 2 | 3 | 1 |
/ | / | / | 2 | 3 |
/ | / | / | / | 1 |
/ | / | / | / | / |
__global__ void in_place_transpose(double *A, int N)
{
int t_idx = blockIdx.x * blockDim.x + threadIdx.x;
unsigned int total_threads = blockDim.x * gridDim.x;
int row_idx = t_idx / N;
int col_idx = t_idx % N;
// only access upper triangular entries
if (row_idx < col_idx)
{
while (row_idx < (N - 1) && col_idx < N) // A[N-1,N] is the last entry we change
{
int to = row_idx * N + col_idx;
int from = col_idx * N + row_idx;
if (from != to) // daigonal is constant
{
// printf("from: %d, to: %d \n", from, to); // show operation pair
// printf("th: %d, row: %d, col: %d \n", t_idx, row_idx, col_idx); // show indizes
int temp = A[to];
A[to] = A[from];
A[from] = temp;
}
// assign next matrix entry to thread (FAULTY)
row_idx = (int)((row_idx + total_threads) % (N - row_idx - 1));
col_idx = (int)(col_idx + (total_threads % (N - row_idx - 1)) % (N - row_idx - 1));
}
}
}