Matrix calculation error appears when dimensions become large

Question

I am running a code where I am simply creating 2 matrices: one matrix is of dimensions arows x nsame and the other has dimensions nsame x bcols. The result is an array of dimensions arows x bcols. This is fairly simple to implement using BLAS and the following code appears to work as intended when using the below master-slave model with OpenMPI:`

#include <iostream>
#include <stdio.h>
#include <iostream>
#include <cmath>
#include <mpi.h>
#include <gsl/gsl_blas.h>
using namespace std;`

int main(int argc, char** argv){
    int noprocs, nid;
    MPI_Status status;
    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &nid);
    MPI_Comm_size(MPI_COMM_WORLD, &noprocs);
    int master = 0;

    const int nsame = 500; //must be same if matrices multiplied together = acols = brows
    const int arows = 500;
    const int bcols = 527; //works for 500 x 500 x 527 and 6000 x 100 x 36
    int rowsent;
    double buff[nsame];
    double b[nsame*bcols];
    double c[arows][bcols];
    double CC[1*bcols]; //here ncols corresponds to numbers of rows for matrix b
    for (int i = 0; i < bcols; i++){
                CC[i] = 0.;
    }; 
    // Master part
    if (nid == master ) { 

        double a [arows][nsame]; //creating identity matrix of dimensions arows x nsame (it is I if arows = nsame)
        for (int i = 0; i < arows; i++){
            for (int j = 0; j < nsame; j++){
                if (i == j)
                    a[i][j] = 1.;
                else
                    a[i][j] = 0.;
            }
        }
        double b[nsame*bcols];//here ncols corresponds to numbers of rows for matrix b
            for (int i = 0; i < (nsame*bcols); i++){
                b[i] = (10.*i + 3.)/(3.*i - 2.) ;
            }; 
        MPI_Bcast(b,nsame*bcols, MPI_DOUBLE_PRECISION, master, MPI_COMM_WORLD);  
        rowsent=0;
        for (int i=1; i < (noprocs); i++) {  
            // Note A is a 2D array so A[rowsent]=&A[rowsent][0]
            MPI_Send(a[rowsent], nsame, MPI_DOUBLE_PRECISION,i,rowsent+1,MPI_COMM_WORLD);
            rowsent++; 
        }

        for (int i=0; i<arows; i++) { 
            MPI_Recv(CC, bcols, MPI_DOUBLE_PRECISION, MPI_ANY_SOURCE, MPI_ANY_TAG,
                     MPI_COMM_WORLD, &status); 
            int sender = status.MPI_SOURCE;
            int anstype = status.MPI_TAG;            //row number+1
            int IND_I = 0;
            while (IND_I < bcols){
                c[anstype - 1][IND_I] = CC[IND_I]; 
                IND_I++;
            }
            if (rowsent < arows) {
                MPI_Send(a[rowsent], nsame,MPI_DOUBLE_PRECISION,sender,rowsent+1,MPI_COMM_WORLD);
                rowsent++; 
            }
            else {       // tell sender no more work to do via a 0 TAG
                MPI_Send(MPI_BOTTOM,0,MPI_DOUBLE_PRECISION,sender,0,MPI_COMM_WORLD);
            }
        }
    }

    // Slave part
    else { 
        MPI_Bcast(b,nsame*bcols, MPI_DOUBLE_PRECISION, master, MPI_COMM_WORLD); 
        MPI_Recv(buff,nsame,MPI_DOUBLE_PRECISION,master,MPI_ANY_TAG,MPI_COMM_WORLD,&status); 
        while(status.MPI_TAG != 0) {
            int crow = status.MPI_TAG; 
            gsl_matrix_view AAAA = gsl_matrix_view_array(buff, 1, nsame);
            gsl_matrix_view BBBB = gsl_matrix_view_array(b, nsame, bcols);
            gsl_matrix_view CCCC = gsl_matrix_view_array(CC, 1, bcols);

            /* Compute C = A B */
            gsl_blas_dgemm (CblasNoTrans, CblasNoTrans, 1.0, &AAAA.matrix, &BBBB.matrix,
                            0.0, &CCCC.matrix); 

            MPI_Send(CC,bcols,MPI_DOUBLE_PRECISION, master, crow, MPI_COMM_WORLD);
            MPI_Recv(buff,nsame,MPI_DOUBLE_PRECISION,master,MPI_ANY_TAG,MPI_COMM_WORLD,&status); 
        }
    }

    // output c here on master node //uncomment the below lines if I wish to see the output
    //    if (nid == master){
//        if (rowsent == arows){
//            //            cout << rowsent;
//            int IND_F = 0;
//            while (IND_F < arows){
//                int IND_K = 0;
//                while (IND_K < bcols){
//                    cout << "[" << IND_F << "]" << "[" << IND_K << "] = " << c[IND_F][IND_K] << " ";
//                    IND_K++;
//                }
//                cout << "\n";
//                IND_F++;
//            }
//        }
//    }
    MPI_Finalize();
    //free any allocated space here
    return 0;
};

Now what appears odd is that when I increase size of the matrices (e.g. from nsame = 500 to nsame = 501), the code no longer works. I receive the following error:

mpirun noticed that process rank 0 with PID 0 on node Users-MacBook-Air exited on signal 11 (Segmentation fault: 11).

I have tried this with other combinations of sizes for the matrices and there always appears to be an upper limit for the size of the matrices themselves (which seems to vary based on how I vary the different dimensions themselves). I have also tried modifying the values of the matrices themselves although this does not appear to change anything. I realize there are alternative ways to initialize the matrices in my example (e.g. using vector) but am simply wondering why my current scheme of multiplying matrices of arbitrary size seems to only work to a certain extent.

score 2 · Accepted Answer · answered Jan 30 '17 at 06:23

You're declaring too many big local variables, which is causing stack space related problems. a, in particular, is 500x500 doubles (250000 8 byte elements, or 2 million bytes). b is even larger.

You'll need to dynamically allocate space for some or all of those arrays.

There might be a compiler option to increase the initial stack space but that isn't a good long term solution.

Matrix calculation error appears when dimensions become large

1 Answers1