I've divided matrix by blocks and multiplied it using Fox's algorithm.
How can I print the result matrix to screen, when that is stored by blocks in different processes, without sending these blocks back to the process with rank 0?
For example.
After multiplication I've got:
Block A:
83 64
112 76
Block B:
118 44
152 34
Block C:
54 68
67 56
Block D:
89 85
114 68
Entire matrix should look like:
83 64 118 44
112 76 152 34
54 68 89 85
67 56 114 68
So far I've made:
Send two blocks that contain one row and print it to screen. But is it possible to print entire result matrix without sending more than one block to process 0?
// Function for gathering the result matrix
// pCBlock - one block containing part of entire result matrix
// Size - matrix dimension
// BlockSize - block dimension
void ResultCollection(double* pCblock, int Size,
int BlockSize) {
double * pResultRow = new double[Size*BlockSize];
for (int i = 0; i<BlockSize; i++) {
MPI_Gather(&pCblock[i*BlockSize], BlockSize, MPI_DOUBLE,
&pResultRow[i*Size], BlockSize, MPI_DOUBLE, 0, RowComm);
}
//print two matrix rows from two blocks
delete[] pResultRow;
}
This can't help
( Ordering Output in MPI )
because for the matrix output I need to print not the entire block A
, than B
, than C
, than D
,
but rather
one line from A
( in process 0 ), one line from B
( from process 1 ),
one line from A
( in process 0 ), one line from B
( from process 1 ),
one line from C
( from process 2 ), one line from D
( from process 3 )
and etc.