2

I quit. This is the most frustrating thing I've ever had to do. Even a non dynamic int array causes segfault. But if I declare it as a float/char whatever array, it works alright.


Update: If i remove the line MPI_Scatter(A[0], N, MPI_INT, A_row, N, MPI_INT, 0, MPI_COMM_WORLD); it works fine. Problem is I need it...


I'm working on a program but I have a bizarre problem.

The following code works fine (if we suppose that N is a multiple of p):

#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"


void main(int argc, char** argv)  
{
   int my_rank, p, N, **A, *diagonals, *A_row;
   MPI_Status status;

   MPI_Init(&argc, &argv);
   MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
   MPI_Comm_size(MPI_COMM_WORLD, &p);

    if (my_rank == 0)  {

        N = 4;
        int *mem = malloc(N * N * sizeof(int));
        A = malloc(N * sizeof(int*));
        for(int i = 0; i < N; i++) 
            A[i] = mem + N*i;    

    }
    MPI_Bcast(&N, 1, MPI_INT, 0, MPI_COMM_WORLD);

    A_row = malloc (N * sizeof(int));

    MPI_Scatter(A[0], N, MPI_INT, A_row, N, MPI_INT, 0, MPI_COMM_WORLD);

    MPI_Finalize();
}

However, I need to allocate another array (diagonals), like this:

#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"


void main(int argc, char** argv)  
{
   int my_rank, p, N, **A, *diagonals, *A_row;
   MPI_Status status;

   MPI_Init(&argc, &argv);
   MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
   MPI_Comm_size(MPI_COMM_WORLD, &p);

    if (my_rank == 0)  {

        N = 4;
        int *mem = malloc(N * N * sizeof(int));
        A = malloc(N * sizeof(int*));
        for(int i = 0; i < N; i++) 
            A[i] = mem + N*i;

        diagonals = malloc (N * sizeof(int));    
    }
    MPI_Bcast(&N, 1, MPI_INT, 0, MPI_COMM_WORLD);

    A_row = malloc (N * sizeof(int));

    MPI_Scatter(A[0], N, MPI_INT, A_row, N, MPI_INT, 0, MPI_COMM_WORLD);

    MPI_Finalize();
}

I get this segmentation fault (if it helps at all):

[teo-VirtualBox:02582] *** Process received signal ***
[teo-VirtualBox:02582] Signal: Segmentation fault (11)
[teo-VirtualBox:02582] Signal code: Address not mapped (1)
[teo-VirtualBox:02582] Failing at address: 0x1
[teo-VirtualBox:02582] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x113d0)[0x7faecc8d23d0]
[teo-VirtualBox:02582] [ 1] a[0x400c85]
[teo-VirtualBox:02582] [ 2] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7faecc511830]
[teo-VirtualBox:02582] [ 3] a[0x4009a9]
[teo-VirtualBox:02582] *** End of error message ***

Am I missing something obvious?

By the way, I'm not using free(), or doing anything specific because this is not the complete code. It's just a side file that I created for testing.

Teo Alivanoglou
  • 89
  • 1
  • 1
  • 8
  • 2
    For the future, load the program with GDB, and it will tell you what line seg faulted, and you can test the values of certain variables. – Dellowar Dec 09 '16 at 19:05
  • @SanchkeDellowar I'm not familiar with gdb. I tried `mpiexec -np 4 a.out -gdb` but it doesn't make any difference in the error message. – Teo Alivanoglou Dec 09 '16 at 19:11
  • @TeoAlivanoglou Could you try and whittle your code down to a minimum verifiable example i.e. take out everything (mainly the mpi stuff) that you don't need so that we can try running it ourselves? – gowrath Dec 09 '16 at 19:16
  • @gowrath just edited it – Teo Alivanoglou Dec 09 '16 at 19:23
  • 2
    you should check what `malloc` returns to make sure it's not returning NULL. That being said you're not asking for a whole lot of memory here, so I wouldn't expect any NULLs returned. – yano Dec 09 '16 at 19:42
  • are you sure `my_rank` always equals 0? If not, `A` never gets `malloc`ed, and the called to `MPI_Scatter` will be broken. Same for `N`. The call to `malloc` `A_row` will be called with junk for `N` unless `my_rank==0` (although I don't know what `MPI_Bcast` does with `N`) – yano Dec 09 '16 at 20:02
  • @yano `MPI_Bcast` broadcasts some value to all processes (in my case, `N`). And `MPI_Scatter` sends stuff from `**A` to the other processes, and saves it in `*A_row` – Teo Alivanoglou Dec 09 '16 at 20:10
  • If it's broadcasting the value of `N` I find it odd that it wants the address of `N`. But that just re-enforces my thought more. If `MPI_Bcast` isn't setting the value of `N`, then you have undefined behavior starting at `A_row = malloc(..);` if `my_rank != 0`. If `my_rank` always equals 0, then why even check? Otherwise, you should have an `else` condition that handles that, or everything involving `N` and `A` should be moved into the `if (my_rank==0)` block. – yano Dec 09 '16 at 20:18
  • @yano First of all, [MPI_Bcast](http://www.mpich.org/static/docs/v3.2/www3/MPI_Bcast.html). It seems to me that you don't really understand how MPI works. Everything that is outside of the `my_rank == 0` block gets executed by all process. – Teo Alivanoglou Dec 09 '16 at 20:23
  • no I don't know a thing about MPI. That's irrelevant for what I've suggested. Unless `my_rank==0`, you're going to use uninitialized memory, which is UB. – yano Dec 09 '16 at 20:29
  • 1
    @yano incorrect. only the process with `rank == 0` accesses `**A` and everything else (except for `diagonals`) is initialized in all processes – Teo Alivanoglou Dec 09 '16 at 20:45
  • You say the *Segmentation fault seems to occur before `malloc()`* but at the same time it works if you remove the `MPI_Scatter`? So did you extend the code to check the return values of all `malloc` calls? In any case, guessing is not productive, please [use a debugger (correctly)](https://www.open-mpi.org/faq/?category=debugging). – Zulan Dec 09 '16 at 22:00
  • fair enough .. I'll yield to MPI mystery black box ... hope you figure it out – yano Dec 09 '16 at 22:02

1 Answers1

1

To be honest, I cannot reproduce:

linux21:/home/users/grad1459/Desktop/parallel>mpiexec -np 4 a.out
linux21:/home/users/grad1459/Desktop/parallel>mpicc -Wall -std=c99 main.c
main.c: In function ‘main’:
main.c:9:15: warning: unused variable ‘status’ [-Wunused-variable]
main.c:8:29: warning: variable ‘diagonals’ set but not used [-Wunused-but-set-variable]
linux21:/home/users/grad1459/Desktop/parallel>mpiexec -np 4 a.out
ALL OK
ALL OK
ALL OK
ALL OK
linux21:/home/users/grad1459/Desktop/parallel>

with very similar code to yours:

#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"


int main(int argc, char** argv)  
{
   int my_rank, p, N, **A, *diagonals, *A_row;
   MPI_Status status;

   MPI_Init(&argc, &argv);
   MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
   MPI_Comm_size(MPI_COMM_WORLD, &p);

    if (my_rank == 0)  {

        N = 4;
        int *mem = malloc(N * N * sizeof(int));
        A = malloc(N * sizeof(int*));
        for(int i = 0; i < N; i++) 
            A[i] = mem + N*i;

        diagonals = malloc (N * sizeof(int));
    }
    MPI_Bcast(&N, 1, MPI_INT, 0, MPI_COMM_WORLD);

    A_row = malloc (N * sizeof(int));

    MPI_Scatter(A[0], N, MPI_INT, A_row, N, MPI_INT, 0, MPI_COMM_WORLD);

    MPI_Finalize();
    printf("ALL OK\n");
    return 0;
}

As a result, I think that your virtualbox has some memory limitations and your malloc() fails, check its return value to make sure it's not NULL, like this: How detect malloc failure?

Here is my version:

linux21:/home/users/grad1459/Desktop/parallel>mpiexec --version
HYDRA build details:
    Version:                                 3.1.3
    Release Date:                            Wed Oct  8 09:37:19 CDT 2014
    CC:                              gcc    
    CXX:                             g++    
    F77:                             gfortran   
    F90:                             gfortran   
    Configure options:                       '--disable-option-checking' '--prefix=/usr/local/mpich3' '--cache-file=/dev/null' '--srcdir=.' 'CC=gcc' 'CFLAGS= -O2' 'LDFLAGS= ' 'LIBS=-lpthread ' 'CPPFLAGS= -I/usr/local/USB/mpich-3.1.3/src/mpl/include -I/usr/local/USB/mpich-3.1.3/src/mpl/include -I/usr/local/USB/mpich-3.1.3/src/openpa/src -I/usr/local/USB/mpich-3.1.3/src/openpa/src -D_REENTRANT -I/usr/local/USB/mpich-3.1.3/src/mpi/romio/include'
    Process Manager:                         pmi
    Launchers available:                     ssh rsh fork slurm ll lsf sge manual persist
    Topology libraries available:            hwloc
    Resource management kernels available:   user slurm ll lsf sge pbs cobalt
    Checkpointing libraries available:       
    Demux engines available:                 poll select

Maybe the problem is that you don't free() your memory? Did you try that?

In general, when using , try allocating the 2D dynamic array in contiguous memory cells (so that can MPI can freely use its stride, etc.). In general you can do this with these functions:

int** allocate2D(int** A, const int N, const int M) {
    int i;
    int *t0;

    A = malloc(M * sizeof (int*)); /* Allocating pointers */
    t0 = malloc(N * M * sizeof (int)); /* Allocating data */
    for (i = 0; i < M; i++)
        A[i] = t0 + i * (N);

    return A;
}

void free2Darray(int** p, const int N) {
    free(p[0]);
    free(p);
}

as I explain in 2D dynamic array in continuous memory locations (C) .


Unrelated to your runtime error: Why do we need to use `int main` and not `void main` in C++?

Community
  • 1
  • 1
gsamaras
  • 71,951
  • 46
  • 188
  • 305
  • Isn't it the same as what I've done? Just `mem` instead of `t0`. Also, thanks for the int vs void page. Also, what could I free without messing up my data? This program obviously does more things than that. – Teo Alivanoglou Dec 09 '16 at 19:33
  • Oops @TeoAlivanoglou you do pretty much the same thing, but with the code lines swapped, sorry. Well, you generally `free()` your memory when you no longer needed it, in other words when you don't care about the data. You could that now to see if the complete minimal example runs. I tried in the machines of my university and it worked fine, even without free'ing the memory! – gsamaras Dec 09 '16 at 19:40
  • Wow. I even copy-pasted your code, just to make sure it's 100% the same. I still get the error. Is there any chance that my mpi or gcc are broken? Or maybe VirtualBox limitations? – Teo Alivanoglou Dec 09 '16 at 19:43
  • I think VirtualBox limitation @TeoAlivanoglou, see my update. – gsamaras Dec 09 '16 at 19:45
  • I can't check if malloc succeded. Segmentation fault seems to occur before `malloc()` returns. – Teo Alivanoglou Dec 09 '16 at 19:51
  • @TeoAlivanoglou Do you even have to `malloc`? In the code you've shown, you don't. You know `N`, so you know the size of everything at compile time. – yano Dec 09 '16 at 20:03
  • @yano I normally don't know `N`. It has to be dynamic. But I used `N=4;` just for testing purposes. – Teo Alivanoglou Dec 09 '16 at 20:08
  • @gsamaras I tried it in a normal linux machine. Still no luck. Any chance you can tell me which implementation of mpi you're using? – Teo Alivanoglou Dec 09 '16 at 20:39
  • @TeoAlivanoglou I updated my answer, good luck, I will upvote your question so that more people come into play, if you find the answer helpful do the same. :) – gsamaras Dec 09 '16 at 20:50
  • @gsamaras Thank you. I'm installing `mpich` as well. I have a feeling it's all `openmpi`'s fault – Teo Alivanoglou Dec 09 '16 at 21:01
  • Nope. Same with mpich. – Teo Alivanoglou Dec 09 '16 at 21:10