1

Why I get the following error for the following code with mpirun -np 2 ./out command? I called make_layout() after resizing the std::vector so normally I should not get this error. It works if I do not resize. What is the reason?

main.cpp:

#include <iostream>
#include <vector>
#include "mpi.h"

MPI_Datatype MPI_CHILD;

struct Child
{
    std::vector<int> age;

    void make_layout();
};

void Child::make_layout()
{
    int nblock = 1;
    int age_size = age.size();
    int block_count[nblock] = {age_size};
    MPI_Datatype block_type[nblock] = {MPI_INT};
    MPI_Aint offset[nblock] = {0};
    MPI_Type_struct(nblock, block_count, offset, block_type, &MPI_CHILD);
    MPI_Type_commit(&MPI_CHILD);
}

int main()
{
    int rank, size;

    MPI_Init(NULL, NULL);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);    

    Child kid;
    kid.age.resize(5);
    kid.make_layout();
    int datasize;
    MPI_Type_size(MPI_CHILD, &datasize);
    std::cout << datasize << std::endl; // output: 20 (5x4 seems OK).

    if (rank == 0)
    {
        MPI_Send(&kid, 1, MPI_CHILD, 1, 0, MPI_COMM_WORLD);
    }

    if (rank == 1)
    {
        MPI_Recv(&kid, 1, MPI_CHILD, 0, 0, MPI_COMM_WORLD, NULL);
    }

    MPI_Finalize();

    return 0;
}

Error message:

*** Process received signal ***
Signal: Segmentation fault (11)
Signal code: Address not mapped (1)
Failing at address: 0x14ae7b8
[ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x113d0)[0x7fe1ad91c3d0]
[ 1] /lib/x86_64-linux-gnu/libc.so.6(cfree+0x22)[0x7fe1ad5c5a92]
[ 2] ./out[0x400de4]
[ 3] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7fe1ad562830]
[ 4] ./out[0x400ec9]
*** End of error message ***
Shibli
  • 5,879
  • 13
  • 62
  • 126
  • This is probably the worst MPI-related advice I've ever given, but you could overload the unary `Child::operator&` to return `age.data()`. – Hristo Iliev Oct 16 '16 at 19:29
  • `int nblock = 1;` should be `const int nblock = 1;` – M.M Oct 16 '16 at 23:18
  • @M.M it did not make any difference. – Shibli Oct 17 '16 at 09:08
  • @HristoIliev what if `age` is not the first member of `struct`? – Shibli Oct 17 '16 at 09:09
  • Then the offset of `age.data()` in the constructed MPI datatype should be relative to the absolute address of the first member of the structure. This applies to all other members of the structure. Or you can simply use absolute addresses as offsets and specify `MPI_BOTTOM` as the buffer address in `MPI_Send` / `MPI_Recv`. – Hristo Iliev Oct 17 '16 at 10:17

3 Answers3

1

The problem here is that you're telling MPI to send a block of integers from &kid, but that's not where your data is. &kid points to an std::vector object, which has an internal pointer to your block of integers allocated somewhere on the heap.

Replace &kid with kid.age.data() and it should work. The reason it "works" when you don't resize is that the vectors will be of 0 size, so MPI will try to send an empty message and no actual memory access takes place.

suszterpatt
  • 8,187
  • 39
  • 60
1

Here is an example with several std::vector members that uses MPI datatypes with absolute addresses:

struct Child
{
    int foo;
    std::vector<float> bar;
    std::vector<int> baz;

    Child() : dtype(MPI_DATATYPE_NULL) {}
    ~Child() { if (dtype != MPI_DATATYPE_NULL) MPI_Type_free(dtype); }

    const MPI_Datatype mpi_dtype();
    void invalidate_dtype();

private:
    MPI_Datatype dtype;
    void make_dtype();
};

const MPI_Datatype Child::mpi_dtype()
{
    if (dtype == MPI_DATATYPE_NULL)
        make_dtype();
    return dtype;
}

void Child::invalidate_dtype()
{
    if (dtype != MPI_DATATYPE_NULL)
        MPI_Datatype_free(&dtype);
}

void Child::make_dtype()
{
    const int nblock = 3;
    int block_count[nblock] = {1, bar.size(), baz.size()};
    MPI_Datatype block_type[nblock] = {MPI_INT, MPI_FLOAT, MPI_INT};
    MPI_Aint offset[nblock];
    MPI_Get_address(&foo, &offset[0]);
    MPI_Get_address(&bar[0], &offset[1]);
    MPI_Get_address(&baz[0], &offset[2]);

    MPI_Type_struct(nblock, block_count, offset, block_type, &dtype);
    MPI_Type_commit(&dtype);
}

Sample use of that class:

Child kid;
kid.foo = 5;
kid.bar.resize(5);
kid.baz.resize(10);

if (rank == 0)
{
    MPI_Send(MPI_BOTTOM, 1, kid.mpi_dtype(), 1, 0, MPI_COMM_WORLD);
}

if (rank == 1)
{
    MPI_Recv(MPI_BOTTOM, 1, kid.mpi_dtype(), 0, 0, MPI_COMM_WORLD, NULL);
}

Notice the use of MPI_BOTTOM as the buffer address. MPI_BOTTOM specifies the bottom of the address space, which is 0 on architectures with flat address space. Since the offsets passed to MPI_Type_create_struct are the absolute addresses of the structure members, when those are added to 0, the result is again the absolute address of each structure member. Child::mpi_dtype() returns a lazily constructed MPI datatype specific to that instance.

Since resize() reallocates memory, which could result in the data being moved to a different location in memory, the invalidate_dtype() method should be used to force the recreation of the MPI datatype after resize() or any other operation that might trigger memory reallocation:

// ...
kid.bar.resize(100);
kid.invalidate_dtype();
// MPI_Send / MPI_Recv

Please excuse any sloppy C++ code above.

Hristo Iliev
  • 72,659
  • 12
  • 135
  • 186
  • Great. Is this always the way to go if an STL container exists in struct/class? I searched how people send class including STL containers but could not find anything. They only show how to send container alone. – Shibli Oct 18 '16 at 08:53
  • This only works for containers that store their elements in contiguous memory. It won't work with linked lists or sets. For more generic way to pass C++ objects around with MPI, you should look into [boost.MPI](http://www.boost.org/libs/mpi). It has a fairly generic serialisation mechanism that supports complex data structures. – Hristo Iliev Oct 18 '16 at 09:01
0

Be careful, you faced several problems.

First std::vector stores object in heap, so data is not really stored inside your struct.

Second you are not able to send STL containers even between dynamic libraries, also for app instances this is also true. Because they may be compiled with different versions of STL and work on different architectures differently.

Here is good answer about this part of question: https://stackoverflow.com/a/22797419/440168

Community
  • 1
  • 1
k06a
  • 17,755
  • 10
  • 70
  • 110
  • 1
    The second part does not apply to this question. The OP is defining an MPI datatype that maps to a sequence of `age.size()` elements of integer type stored contiguously in memory, which is exactly what `std::vector` is. With MPI as the middleware, it will not only work between random processes (or app instances as you call it), but also between processes on different architectures (if the MPI implementation supports heterogeneous environments). – Hristo Iliev Oct 16 '16 at 19:35