MPI master process not waiting for other procs to calculate

Question

Hello I am new to programming with MPI. I'm trying to multiply two matrices together (an NxN matrix (A) and an Nx1 (B) matrix) to get a resulting C matrix (Nx1). Each process is supposed to calculate a row (element) in matrix C, however only process 0 (my master process) calculates correctly, as it does not appear to be waiting for the other processes to finish calculating. I am also unsure if the non-master procs are sending the result back correctly (or if they even need to?). Here is my code:

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include "mpi.h"

#define PRINT_VECS 1
#define MAX_RAND 100
#define MASTER 0
#define COLUMNS_B 1
#define N 4

void init_vec(int *vec, int len);
void print_vec(const char *label, int *vec, int len);

void init_vec(int *vec, int len)
{
    int i;
    for (i = 0; i < len; i++)
    {
        vec[i] = rand() % MAX_RAND;
    }    
}

void print_vec(const char *label, int *vec, int len)
{
#if PRINT_VECS
    printf("%s", label);     
    int i;
    for (i = 0; i < len; i++)
    {
        printf("%d ", vec[i]);
    }
    printf("\n\n");
#endif
}

void init_matrix(int** matrix, int rows, int cols)
{
    int i,j;
    for (i = 0; i < rows; i++)
    {
        for (j = 0; j < cols; j++)
        {
            matrix[i][j] = rand() % MAX_RAND;
        }
    }
}

void print_matrix(int** matrix, int rows, int cols)
{
    int i;
    for (i = 0; i < rows; i++)
    {
        printf("|");
        int j;
        for (j = 0; j < cols; j++)
        {
            printf("%d ", matrix[i][j]);
        }
        printf("|\n");
    }
}   


int main(int argc, char *argv[])
{
    int my_rank;
    int num_procs;
    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); //grab this process's rank
    MPI_Comm_size(MPI_COMM_WORLD, &num_procs); //grab the total num of processes

    int results[num_procs]; // used to store the partial sum computed
    int rows, cols, colsB;
    rows = N;
    cols = N;
    colsB = COLUMNS_B;
    int **A; // N x N Matrix
    int B[N]; // N x 1 Matrix
    int **C; // N x 1 Matrix

    double start_time; // use these for timing
    double stop_time;

    if (my_rank == MASTER)
    {
        printf("Number of processes: %d\n", num_procs);
        printf("N: %d\n", N);
        srand(time(NULL));

        // init A
        int i;
        A = malloc(rows * sizeof *A);
        for (i = 0; i < rows; i++)
        {
            A[i] = malloc(cols * sizeof *A[i]);
        }
        init_matrix(A, rows, cols);
        printf("Matrix A:\n");
        print_matrix(A, rows, cols);

        // init B
        init_vec(B, N);
        print_vec("Matrix B:\n", B, N);

        // init C
        C = malloc(rows * sizeof *C);
        for (i = 0; i < rows; i++)
        {
            C[i] = malloc(colsB * sizeof *C[i]);
        }

        start_time = MPI_Wtime();
    }

    MPI_Bcast(B, N, MPI_INT, 0, MPI_COMM_WORLD);
    //MPI_Bcast(A, N, MPI_INT, 0, MPI_COMM_WORLD);

    int row = my_rank;

    int my_sum = 0;

    int i;
    if (my_rank < N)
    {
        for (i = 0; i < N; i++)
        {
            int num = A[row][i] * B[i];
            my_sum = my_sum + num;
        }



    C[row] = &my_sum;
    printf("HAI FROM PROCESS %d! I will calculate row %d. My calculation: %d\n", my_rank, row, my_sum);
}

//MPI_Gather(&C, 1, MPI_INT, results, 1, MPI_INT, 0, MPI_COMM_WORLD);

if (my_rank == MASTER)
{
    stop_time = MPI_Wtime();
    printf("\nMatrix C:\n");
    print_matrix(C, rows, colsB);
    printf("Total time (sec): %f\n", stop_time - start_time);
}

MPI_Finalize();

return EXIT_SUCCESS;;

}

I'm pretty sure I'm close but I'm just missing something. I've tried adding some of those commented out statements with broadcasting the A matrix also, and/or calling MPI_GATHER, but nothing seems to be giving results from any procs other than the master proc, so clearly i'm still doing something wrong. Here is some sample output:

Number of processes: 28
N: 4
Matrix A:
|11 30 69 24 |
|83 38 66 71 |
|68 71 27 33 |
|58 5 50 10 |
Matrix B:
1 58 81 44

HAI FROM PROCESS 0! I will calculate row 0. My calculation: 8396

Matrix C:
|8396 |
|-2107258888 |
|-2107258920 |
|-2107258888 |
Total time (sec): 0.000078

So proc 0 is calculating correctly, however my error message is that proc 1 is getting a seg fault and I can't figure out why. The error I'm getting is: mpirun noticed that process rank 1 with PID 0 exited on signal 11 (Segmentation fault). Any help would be greatly appreciated!

You need to provide the functions that you're not providing like `init_matrix` in order for this question to be resolvable. Besides that, you're using this constant `MASTER` but then in your `BCAST` you're using 0 as the root process. Stick to one in case they're not the same. More importantly, you're not allocating memory on all the processes, just on the root. That's probably one of the reasons your program is failing. — atru, Oct 21 '17 at 00:58
Also, this is a matrix-vector multiplication.. I have this problem solved with static arrays and a typedef instead of malloc. Post your whole code and we can possibly identify where the issue is. In the problems where I use malloc I also allocate the arrays on every process. Malloc returns a pointer to memory location. MPI is not a shared memory protocol. Your code may in principle run on many different nodes which are different and independent machines. A pointer to a memory location on one of those machines means nothing on the others. — atru, Oct 21 '17 at 01:01
@atru thanks for the responses! I've added the rest of my code. As a response, the matrices are randomly generated so only the master process is generating them. And `MASTER` == 0 so I may use them interchangeably and not be consistent in places. And I am also aware B is a vector and not a matrix, but I'm treating it like a matrix. I didn't feel the need to include the other methods since I know they're working properly, but I've posted them now. — keenns, Oct 21 '17 at 03:39

score 1 · Answer 1 · answered Oct 21 '17 at 03:57

in C, a 2D matrix is an array of pointers and this is row major, which means a 2D matrix is an array of pointers, and a given pointer points to a row (not a column).

That being said, MPI expects the rows are in consecutive memory, and hence, an array of pointers is not a good fit for MPI.

A static array is a good fit, but if you need dynamic array, you need to allocate your matrix "all at once", and then manually build the array of pointers.

So you can replace

    A = malloc(rows * sizeof *A);
    for (i = 0; i < rows; i++)
    {
        A[i] = malloc(cols * sizeof *A[i]);
    }

with

    A = (int **)malloc(cols * sizeof(int *);
    A[0] = (int *)malloc(cols * rows * sizeof(int));
    for (i = 1; i < cols; i++)
    {
        A[i] = A[i-1] + rows;
    }

and then you will be able to broadcast your matrix with

MPI_Bcast(A[0], cols*rows, MPI_INT, 0, MPI_COMM_WORLD);

Please refer to MPI_Bcast a dynamic 2d array for a lengthy explanation

fwiw, i do not understand your logic about the C data. as far as i understand, this is a simple vector, so declare it like this and do not play with pointers (since MPI will not handle them correctly)

score 1 · Accepted Answer · answered Oct 21 '17 at 05:51

This is your program with fixed issues:

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include "mpi.h"

#define PRINT_VECS 1
#define MAX_RAND 100
#define MASTER 0
#define COLUMNS_B 1
#define N 4

void init_vec(int *vec, int len);
void print_vec(const char *label, int *vec, int len);

void init_vec(int *vec, int len)
{
    int i;
    for (i = 0; i < len; i++)
    {
        vec[i] = rand() % MAX_RAND;
    }
}

void print_vec(const char *label, int *vec, int len)
{
#if PRINT_VECS
    printf("%s", label);
    int i;
    for (i = 0; i < len; i++)
    {
        printf("%d ", vec[i]);
    }
    printf("\n\n");
#endif
}

void init_matrix(int** matrix, int rows, int cols)
{
    int i,j;
    for (i = 0; i < rows; i++)
    {
        for (j = 0; j < cols; j++)
        {
            matrix[i][j] = rand() % MAX_RAND;
        }
    }
}

void print_matrix(int** matrix, int rows, int cols)
{
    int i;
    for (i = 0; i < rows; i++)
    {
        printf("|");
        int j;
        for (j = 0; j < cols; j++)
        {
            printf("%d ", matrix[i][j]);
        }
        printf("|\n");
    }
}


int main(int argc, char *argv[])
{
    int my_rank;
    int num_procs;
    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); //grab this process's rank
    MPI_Comm_size(MPI_COMM_WORLD, &num_procs); //grab the total num of processes

    int results[num_procs]; // used to store the partial sum computed
    int rows, cols, colsB, k;
    rows = N;
    cols = N;
    colsB = COLUMNS_B;
    int **A; // N x N Matrix
    int B[N]; // N x 1 Matrix
    int C[N]; // N x 1 Matrix

    // Allocate memory for the NxN matrix on all processes
    A = (int**) malloc(N * sizeof(int*));
    for(k=0;k<N;k++)
        A[k]= (int*) malloc(N * sizeof(int));

    double start_time; // use these for timing
    double stop_time;

    if (my_rank == MASTER)
    {
        printf("Number of processes: %d\n", num_procs);
        printf("N: %d\n", N);
        srand(time(NULL));

        // Initilize arrays on root only
        init_matrix(A, rows, cols);
        printf("Matrix A:\n");
        print_matrix(A, rows, cols);

        init_vec(B, N);
        print_vec("Matrix B:\n", B, N);

        start_time = MPI_Wtime();
    }

    // Be consistent with names vs. values to avoid bugs
    MPI_Bcast(B, N, MPI_INT, MASTER, MPI_COMM_WORLD);

    for (k=0; k<N; k++)
        MPI_Bcast(&(A[k][0]), N, MPI_INT, MASTER, MPI_COMM_WORLD);

    int row = my_rank;  
    int my_sum = 0;

    int i,num;
    if (my_rank < N)
    {
        for (i = 0; i < N; i++)
        {
            num = A[row][i] * B[i];
            my_sum = my_sum + num;
        }

        C[row] = my_sum;
        printf("HAI FROM PROCESS %d! I will calculate row %d. My calculation: %d\n", my_rank, row, my_sum);
   }

    MPI_Gather(&C[row], 1, MPI_INT, &C[row], 1, MPI_INT, MASTER, MPI_COMM_WORLD);

    if (my_rank == MASTER)
    {
        stop_time = MPI_Wtime();
        print_vec("Matrix C:\n", C, N);
        printf("Total time (sec): %f\n", stop_time - start_time);
    }

    // Free matrix A
    for(k=0;k<N;k++)
        free(A[k]);
    free(A);

    MPI_Finalize();

    return EXIT_SUCCESS;
}

Like said in the comments, in this case you need to allocate memory for all your matrices on all your processes. That process is not the same as initialization of A and B which the program does only on the root process. Here A is allocated using malloc, while C is allocated statically and further on used as a vector in the same way as B. This is not necessary, but seems like a better choice since C is a 1D array and is in it's essence equivalent to B.

B is broadcasted to all process as before, but instead of 0 the program uses MASTER so when you by any chance change the value of MASTER all occurrences of it also change. This is generally a good programming practice and it's applied everywhere in the new code.

A is broadcasted in a simple but surely less efficient way than @Gilles Gouailardet suggested - the program simply broadcasts every row of A separately,

for (k=0; k<N; k++)
    MPI_Bcast(&(A[k][0]), N, MPI_INT, MASTER, MPI_COMM_WORLD);

This is related to row-major ordering and the fact that these N elements of kth row in A are accessed contiguously. This would fail if A was sent by columns.

Remaining changes are assigning the value of my_sum rather than the pointer to it to C[row], C[row] = my_sum; and the gathering operation:

MPI_Gather(&C[row], 1, MPI_INT, &C[row], 
       1, MPI_INT, MASTER, MPI_COMM_WORLD);

Here each process sends its value C[row] to the C[row] on the root process. C is printed on the root using the print_vec function.

Ooooh I understand a lot better now... I need to allocate A on all of the processes but only initialize on the master thread... I get it! Thank you so much for the clear explanation. — keenns, Oct 21 '17 at 06:28
Glad to explain :) You need space for A on every process so you have space where A will be stored when sending it from the root - that would be more-less the idea. — atru, Oct 21 '17 at 06:39

MPI master process not waiting for other procs to calculate

2 Answers2