6

I'm currently writing a simulation using boost::mpi on top of openMPI and everything works great. However once I scale up the system and therefore have to send larger std::vectors I get errors.

I've reduced the issue to the following problem:

#include <boost/mpi.hpp>
#include <boost/mpi/environment.hpp>
#include <boost/mpi/communicator.hpp>
#include <boost/serialization/vector.hpp>
#include <iostream>
#include <vector>
namespace mpi = boost::mpi;

int main() {
    mpi::environment env;
    mpi::communicator world;

    std::vector<char> a;
    std::vector<char> b;
    if (world.rank() == 0) {
        for (size_t i = 1; i < 1E10; i *= 2) {
            a.resize(i);
            std::cout << "a " << a.size();
            world.isend(0, 0, a);
            world.recv(0, 0, b);
            std::cout << "\tB " << b.size() << std::endl;
        }
    }
    return 0;
}

prints out:

a 1 B 1
a 2 B 2
a 4 B 4
....
a 16384 B 16384
a 32768 B 32768
a 65536 B 65536
a 131072    B 0
a 262144    B 0
a 524288    B 0
a 1048576   B 0
a 2097152   B 0

I'm aware that there is a limit to a mpi message size, but 65kB seems a little low to me. Is there a way of sending larger messages?

tik
  • 63
  • 5
  • According to [this](http://stackoverflow.com/questions/13558861/maximum-amount-of-data-that-can-be-sent-using-mpisend) you should not even be close to the max. message size. No idea what's going wrong here though. – Baum mit Augen Jan 15 '15 at 15:38
  • What happens if you change `isend` to `send`? It could be that the non blocking send is causing an issue. – NathanOliver Jan 15 '15 at 16:04
  • @NathanOliver : If I change the isend to send, it just stops (blocks) after writing the a 65536 B 65536 line. – tik Jan 15 '15 at 16:08
  • @tk - can you query the `status` that is returned by `recv`? That might point you in a direction. – NathanOliver Jan 15 '15 at 16:16
  • @NathanOliver Ok, I tried that: status.error() always returns 0. – tik Jan 15 '15 at 16:33
  • With it stopping at 65K then only other thing I can think of is there is some sort of thread local storage going on. – NathanOliver Jan 15 '15 at 17:49
  • Although this is not a correct MPI program (you are not waiting on or testing the request returned by `isend()`), it must be a bug in `boost.mpi`. And yes, it is supposed to block when `isend()` is replaced by `send()` and the message size is above the internal eager limit. – Hristo Iliev Jan 16 '15 at 17:13

1 Answers1

4

The limit of the message size is the same as for MPI_Send: INT_MAX.

The issue is that you are not waiting for the isend to finish before resizing the vector a in the next iteration. This means that isend will read invalid data due to the reallocations in the vector a. Note that buffer a is passed by reference to boost::mpi and you are thus not allowed to change the buffer a until the isend operation has finished.

If you run your program with valgrind, you will see invalid reads as soon as i = 131072.

The reason your program works till 65536 bytes, is that OpenMPI will send messages directly if they are smaller than the components btl_eager_limit. For the self component (sending to the own process), this happens to be 128*1024 bytes. Since boost::serialization adds the size of the std::vector to the byte stream, you exceed this eager_limit as soon as you use 128*1024 = 131072 as your input size.

To fix your code, save the boost::mpi::request return value from isend() and then add wait() to the end of the loop:

#include <boost/mpi.hpp>
#include <boost/mpi/environment.hpp>
#include <boost/mpi/communicator.hpp>
#include <boost/serialization/vector.hpp>
#include <iostream>
#include <vector>
namespace mpi = boost::mpi;

int main() {
    mpi::environment env;
    mpi::communicator world;

    std::vector<char> a;
    std::vector<char> b;
    if (world.rank() == 0) {
        for (size_t i = 1; i < 1E9; i *= 2) {
            a.resize(i);
            std::cout << "a " << a.size();
            mpi::request req = world.isend(0, 0, a);
            world.recv(0, 0, b);
            std::cout << "\tB " << b.size() << std::endl;
            req.wait();
        }
    }
    return 0;
}
Patrick
  • 900
  • 7
  • 14