Distributed MAtrix Transpose-Process Template-Cartesian

Question

I have been trying to implement a distributed matrix transpose program. The main idea is to have a template for each processor (pxq), and split the matrix among the processors using the template, then use block cyclic distribution after distributing the blocks among the processors. What I am trying to do is in the simple paper in the below link, ...

http://www.google.jo/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&uact=8&ved=0CDIQFjAB&url=http%3A%2F%2Fmy.fit.edu%2Fbeowulf%2Fstudent_projects%2Fseetharaman.doc&ei=ttCBU-hWiaw874KAyAs&usg=AFQjCNHmqqBP-pxOABg4Zd08PIcLRQISSw&sig2=HfHssSe_q7o46760oTk1Bg&bvm=bv.67720277,d.ZWU

I have checked the below answer from your site:

sending blocks of 2D array in C using MPI

.. seems okay as you used the MPI_create, in my code I actually went to use the MPI_Cart, Cartesian topology but I got stuck in half way because I didn't understand what they did in the paper. How they distributed the blocks among different processors, (how to program the 2D block cyclic among these processors)?

So what my question is if you can help me:

How exactly do I code a 2 dimensional block cyclic (if we say we have 12x12 matrix, each processor having template 3x4)?

Can you check the link above and see how did they distribute the blocks among the processors? Need any help I can get! I am desperate, and finally do I keep going on the Cartesian topology?

Below is my code, part of it, couldn't know whats next step:

#include "mpi.h"
#include <stdio.h>

#define NP 4 // number of processors
#define M_ROW 4 //template of processor row
#define M_COL 3 //template of processor col

int main(int argc, char *argv[])
{
    int myid, numprocs;
    MPI_Comm comm;
    int dim[2], period[2], reorder;
    int coord[2];

    int A[8][6], array_P[M_ROW][M_COL]; //, AT[8][6];
    int n =0, Temp;
    int TT[8][6];

    int iv, jv, rankid; // for coordinates of each processor in the Cartesian matrix
    int k, y, i,j;

    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &myid);
    MPI_Comm_size(MPI_COMM_WORLD, &numprocs);


    // First: building the matrix

      for (i = 0; i < 8; i++) 
        for (j = 0; j < 6; j++)
            {
          A[i][j] = n;
          n++;
        }

    //Second to the virtual matrix with each processor having cartesian Coord.

    dim[0]= 2; // dimension
    dim[1]= 2; // dimensions assign for Cartesian

    period[0]=1; period[1]=1;  //row periodic + col periodic (each column/row forms a ring)
    reorder=1; // here is false in meaning to allow the reordering of the processors

    MPI_Cart_create(MPI_COMM_WORLD, 2, dim, period, reorder, &comm);
    MPI_Comm_rank(comm, &rankid); // creating rank for each processor location
    MPI_Cart_coords(comm, rankid, 2, coord); // creating coordinates for each Prc.

    MPI_Barrier(MPI_COMM_WORLD);

    iv = coord[0];
    jv = coord[1];


    //printf("Processor Rank %d receive dimensions (iv,jv)-> iv: %d ,jv: %d \n", myid, coord[0], coord[1]);


    for (k=0; k<M_ROW; k++)
    {
        for (y=0; y<M_COL; y++)
        {
            i = k + iv*M_ROW; 
            j = y + jv*M_COL;
            //array_P[k][y] = i*10 + j;
            array_P[k][y] = A[i][j];
        }
    }//end loop of filling data

    //("Processor %d: Before Transpose:\n", myid);

        if(myid == 3)
        {
        for (k=0; k<M_ROW; k++) // 3 ?? NBLK_R;
        {
            j = k + iv*M_ROW;       
            for (y=0; y<M_COL; y++) // 2 ?
            {
                i = y + jv*M_COL;
                printf(" %d    ", A[j][i]);
            }
        printf("\n");
        }  
        }

    printf("\n");




    //MPI_Alltoall(TT, M_ROW*M_COL, MPI_INT, TT, M_ROW*M_COL, MPI_INT, MPI_COMM_WORLD);

        /*
        if(myid == 2)
        {
        for (k=0; k<M_ROW; k++) // 3 ?? NBLK_R;
        {
            // = k + iv*M_ROW;      
            for (y=0; y<M_COL; y++) // 2 ?
            {
                //i = y + jv*M_COL;
                //Final[j][i] = array_PT[x][y];// check the arraypt ?
                printf(" %d    ", array_P[k][y]);
            }
        printf("\n");
        }  
        } */

    //Fourth - transposing the original matrix

    for (k=0; k<M_ROW; k++)
    {
        for (y=0; y<M_COL; y++)
        {
            i = k + iv*M_ROW; 
            j = y + jv*M_COL;
            Temp = A[i][j];
            A[i][j] = A[j][i];
            A[j][i] = Temp;

        }
    }


    printf("\n \n");

    if(myid == 3)
        {
        for (k=0; k<M_ROW; k++) // 3 ?? NBLK_R;
        {
            j = k + iv*M_ROW;       
            for (y=0; y<M_COL; y++) // 2 ?
            {
                i = y + jv*M_COL;
                printf(" %d    ", A[j][i]);
            }
        printf("\n");
        }  
        }

    printf("\n");




    //MPI_Barrier(comm);

    // send to main process - process 0 in our case - all the array_PT transposed
    // ml*nl -> 2*3
    //MPI_Send(array_PT,M_COL*M_ROW , MPI_INT, 0, 1, comm);

    //MPI_Isend(array_PT,M_COL*M_ROW , MPI_INT, 0, 1, comm, &request);

    //MPI_Barrier(MPI_COMM_WORLD);

    //int iv_tt , jv_tt;

    //******************************

    MPI_Finalize();
    return 0;
}

Something like this https://github.com/kapiliitr/hpc/blob/master/mpi/transpose.c ? — Vladimir F Героям слава, May 25 '14 at 12:43
Your link to the paper doesn't work, and it's not clear what you're trying to do here - it sounds like your current problem is getting [MPI_Cart_create](http://mpi.deino.net/mpi_functions/MPI_Cart_create.html) to work, is that right? What isn't working right now? — Jonathan Dursi, May 25 '14 at 13:26
i can manage to work with the MPI_Cart_create, mainly i do virtual table, &assign for each processor coordinates,&in each processor there is number of blocks, my question will be:how to distribute the blocks among the processor using "2D block cyclic distribution", after search i have the pseudocode below: e.g. p processors with a (Nx, Ny) virtual topology, p=Nx*Ny Matrix A[n][n], n = rx * Nx = ry * Ny A[rx*I : rx*(I+1)-1, J:Ny:n-1], I=0,…,rx-1, J=0,…,ry-1, is a 2D decomposition, block distribution in x direction, cyclic distribution in y direction Matrix A[n][n], n = rx * Nx = ry * Ny — userTasim, May 26 '14 at 13:39
hello @VladimirF , in your code,. why in the last printing you went again to equal the trans to arr array? becoz the arr is the original array ? — userTasim, May 30 '14 at 13:57

Distributed MAtrix Transpose-Process Template-Cartesian

0 Answers0