2

How to to parallelize the matrix transpose ?

I know that to transpose matrix I must apply something about this:

for (int i = 0; i < matrix.length - 1; i++) {
    for (int j = i + 1; j < matrix[i].length; j++) {
        tmp = matrix[i][j];
        matrix[i][j] = matrix[j][i];
        matrix[j][i] = tmp;
    }
}

But how to parallelize this operation, I dont know.

I need create N threads to transpose matrix 4n x 4n.

Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555
user2430023
  • 31
  • 1
  • 4
  • 1
    Definitely use a well optimized lib for that - those even can use native packages and high CPU-optimization, I hope. Really optimizing a matrix-multiplication is a long road to walk or crawl. See http://stackoverflow.com/q/529457/2277620, it refers to JBlas. – flaschenpost May 31 '13 at 20:50
  • 1
    +1 transpose is not a CPU bound operation so I wouldn't expect using multiple threads to help. Transpose is a memory bandwidth bound operation i.e. lots of copies from different points in memory. In other words, you can do it, but I would expect it to be slower, not faster. – Peter Lawrey May 31 '13 at 21:25

3 Answers3

8

Since this sounds like a homework problem, I won't give you the answer straight out, but I'll point you in the right direction.

Let's say you're transposing a 4x4 matrix:

A B C D      A E I M
E F G H  ->  B F J N
I J K L      C G K O
M N O P      D H L P

If we break that down into four sub-matrices:

A B | C D      A E | I M
E F | G H      B F | J N
----+----  ->  ----+----
I J | K L      C G | K O
M N | O P      D H | L P

Notice that the resulting four sub-matrices are all transposes of the four you started with (with the upper right and lower left matrices swapped). How can you make use of this? :)

  • Yes, this is a hometask) I understand your idea, but I don`t understand two last sentence :[ – user2430023 May 31 '13 at 21:03
  • Compare the top left quarter of the matrix before and after the transpose. How are these two related to one another? –  May 31 '13 at 21:09
  • they are varied with rows of columns, that I need. thank you. I will trying today write it. – user2430023 Jun 01 '13 at 09:30
1

I find that it's frequently better to just carry a "transposed" flag (bool, bit, whatever) and use that to reverse your indexing calculations. That seems to be the way of the BLAS, LAPACK, etc.

It's going to be hard to get much parallel speedup here because of cache contention anyways.

Drew Hall
  • 28,429
  • 12
  • 61
  • 81
0

If you want a simple parallel solution to your problem, something like this could work.

double[][] matrix=new double[numberOfRows][numberOfColumns];
double[][] transpose = new double[numberOfColumns][numberOfRows];
IntStream.range(0, numberOfColumns * numberOfRows).parallel().forEach(i ->
{
    int m = i / numberOfRows;
    int n = i % numberOfRows;
    transpose[m][n] = matrix[n][m];
});

This uses a parallel IntStream, which you could think of as a parallelized for-loop that runs for the number of elements in the matrix. Notice that I assigned two variables to get the actual row and column I need to target for the transposition.

Dividing the index i the stream is currently at by the number of rows gives you the index of the target row in the transpose matrix. The modulo of index i and the number of rows gives you the column of the transpose matrix that should be assigned.

darkblueflow
  • 51
  • 2
  • 4