0

I am trying to communicate a std::vector<MyClass> with varying size via MPI. MyClass contains members that are vectors that may be uninitialized or vary in size. To do that, I wrote a serialize() und deserialize() function that reads and writes such a std::vector<MyClass> to a std::string, which I then communicate via MPI.

class MyClass {
    ...
    int some_int_member;
    std::vector<float> some_vector_member;
}

std::vector<MyClass> deserialize(const std::string &in) {
    std::istringstream iss(in);

    size_t total_size;
    iss.read(reinterpret_cast<char *>(&total_size), sizeof(total_size));

    std::vector<MyClass> out_vec;
    out_vec.resize(total_size);

    for(MyClass &d: out_vec) {
        size_t v_size;
        iss.read(reinterpret_cast<char *>(&d.some_int_member), sizeof(d.some_int_member));
        iss.read(reinterpret_cast<char *>(&v_size), sizeof(v_size));
        d.some_vector_member.resize(v_size);
        iss.read(reinterpret_cast<char *>(&d.some_vector_member[0]), v_size * sizeof(float));
    }

    return out_vec;
}


std::string serialize(std::vector<MyClass> &data) {
    std::ostringstream os;

    size_t total_size = data.size();
    os.write(reinterpret_cast<char *>(&total_size), sizeof(total_size));

    for(MyClass &d: data) {
        size_t v_size = d.some_vector_member.size();
        os.write(reinterpret_cast<char *>(&some_int_member), sizeof(some_int_member));
        os.write(reinterpret_cast<char *>(&v_size), sizeof(v_size));
        os.write(reinterpret_cast<char *>(&d.some_vector_member[0]), v_size * sizeof(float));
    }
    return os.str();
}

My implementation works in principle, but sometimes (not always!) MPI processes crash at positions I think are related to the serialization. The payload sent can be as big as hundrets of MB. I suspect that using std::string as a container is not a good choice. Are there some limitations using std::string as a container for char[] with huge binary data that I may be running into here?

(Note, that I don't want to use boost::mpi along with its serialization routines, neither do I want to pull in a huge library such as cereal into my project)

janoliver
  • 7,744
  • 14
  • 60
  • 103
  • I'm not really sure what you want from an answer. Is this about the crashes? Then we need an [mcve] and a description of your present debugging efforts. Or is this about how to do serialization properly? Or is this how to send compound C++ objects with MPI (serialization is only one answer to that)? If this is really about "*implement the sending of serialized data via MPI*", then at the very least we'd need to see your MPI code. Many of those questions are also heavy on opinion ("*better way*") please focus the question on specific goals and criteria. – Zulan Jun 06 '17 at 09:16
  • Hi Zulan, I'm sorry that the question is not very precise. I'll try to rephrase it. I am using the above routines in a large numerical simulation which _sometimes_ crashes, even if there was no problem for many MPI requests. The stack trace is not very helpful (it contains `bad_alloc`, so I guess its some memory thing), and I am not easily able to create a minimal working example. I suspected some limitations of `std::string` to be the issue, therefore my question. – janoliver Jun 06 '17 at 09:21
  • You could try to encapsulate your serialize method in a `try {...} catch(std::bad_alloc&) { ... }` block. Moreover, you could use a memory profiler to analyze memory leaks. – Simone Cifani Jun 06 '17 at 12:16
  • Although strings might work, MPI provides its own portable ways to serialise (pack) data using `MPI_Pack` and `MPI_Unpack`. Also, you might want to look into the implementation of serialisation in Boost.MPI. – Hristo Iliev Jun 07 '17 at 07:54

1 Answers1

1

Generally, using std::string for binary data is fine although some people might prefer std::vector<char> - or std::vector<std::byte> in C++17 (see also, note C++11 strings guarantee contiguous data). There are two significant efficiency issues in your code:

  1. You always have three copies of the whole data. The original objects, the serialized string and the intermediate [io]stringstream.
  2. You cannot pre-allocate (reserve) data in ostringstream, which may lead to over-allocation and frequent reallocation.

Hence, you waste a significant amount of memory, which might contribute to bad_alloc. That said, it may be perfectly fine and you just have a memory leak somewhere. It's impossible to tell if this is a practical issue for you without knowing the cause of the bad_alloc and a performance analysis of your application.

Zulan
  • 21,896
  • 6
  • 49
  • 109