How to write an MPI wrapper for dynamic loading

Question

Since MPI doesn't offer binary compatibility, only source compatibility, we're forced to ship our solver source code to customers for them to use our solver with their preferred version of MPI. Well, we reached the point where we cannot offer source code anymore.

As a result, I'm looking into ways to create a wrapper around MPI calls. The idea is for us to provide a header of stub functions, and the user would write the implementation, create a dynamic library out of it, and then our solver would load it at runtime.

But solutions aren't "elegant" and are prone to errors. Because there are struct arguments (say, MPI_Request) whose struct definitions may differ from one MPI implementation to another, we need to accept (void*) for many of our stub arguments. Also, if the number of arguments can differ from one MPI to another (which I'm not sure if it's guaranteed to not happen, ever) than the only way around that is using var_args.

//header (provided by us)
int my_stub_mpi_send(const void buf, int count, void* datatype,
        int dest, int tag, void* comm);

//*.c (provided by user)
#include <my_stub_mpi.h>
#include <mpi.h>
int my_stub_mpi_send(const void buf, int count, void* datatype,
        int dest, int tag, void* comm)
{
    return MPI_Send(buf, count, *((MPI_Datatype) datatype),
            dest, tag, ((MPI_Comm) comm));
}
//Notes: (1) Most likely the interface will be C, not C++,
//           unless I can make a convincing case for C++;
//       (2) The goal here is to avoid *void pointers, if possible;

My question is if anyone knows of a solution around those issues?

How about you illustrate an attempt in code, and then define the specific problem that causes the attempt to not work? Otherwise, this seems an obvious application of the Bridge Pattern. — jxh, Jul 18 '16 at 17:14
'//header (provided by us) int my_stub_mpi_send(const void *buf, int count, void* datatype, int dest, int tag, void* comm); //*.c (provided by user) #include #include int my_stub_mpi_send(const void *buf, int count, void* datatype, int dest, int tag, void* comm) { return MPI_Send(*buf, count, *((MPI_Datatype*) datatype), dest, tag, *((MPI_Comm*) comm)); } //Notes: (1) Most likely the interface will be C, not C++, unless I can make a convincing case for C++; // (2) The goal here is to avoid *void pointers, if possible;' — blue scorpion, Jul 18 '16 at 17:41
The main issue here is that MPI types vary from one implementation to another. In a C++ approach I'd use template stub function declarations. But even that is a problem, because then the implementation (*.cpp file) would have to explicitly instantiate those, since their definitions are not present in the header file. I'd rather stay away from explicit template instantiations because they can explode exponentially. It'd be nice to use Bridge or Adapter design patterns, but that assumes common abstract bases among all MPI implementations for each MPI type, which is probably too much to assume. — blue scorpion, Jul 18 '16 at 18:04
Thanks for the edit, jxh. Actually, I think the Bridge / Adaptor solution is elegant enough and avoids shoveling void* everywhere. I'll just use empty base classes that the interface implementer would inherit from, add an implementation (via pImpl idiom, say) and unpack it at run-time. Thank you. — blue scorpion, Jul 18 '16 at 18:35
One PMPI wrapper implementation (see answers) can be found in: https://github.com/UIUC-PPL/PMPI_Projections — user2864740, Apr 19 '20 at 03:34

score 3 · Answer 1 · answered Jul 19 '16 at 11:48

If you are only targeting platforms that support the PMPI profiling interface, then there is a generic solution that requires minimal to no changes in the original source code. The basic idea is to (ab-)use the PMPI interface for the wrapper. It is probably in some non-OO sense an implementation of the bridge pattern.

First, several observations. There is a single structure type defined in the MPI standard and that is MPI_Status. It has only three publicly visible fields: MPI_SOURCE, MPI_TAG, and MPI_ERR. No MPI function takes MPI_Status by value. The standard defines the following opaque types: MPI_Aint, MPI_Count, MPI_Offset, and MPI_Status (+ several Fortran interoperability types hereby dropped for clarity). The first three are integral. Then there are 10 handle types, from MPI_Comm to MPI_Win. Handles can be implemented either as special integer values or as pointers to internal data structures. MPICH and other implementations based on it take the first approach while Open MPI takes the second one. Being either a pointer or an integer, a handle of any kind can fit within a single C datatype, namely intptr_t.

The basic idea is to override all MPI functions and redefine their arguments to be of an intptr_t type, then have the user-compiled code do the transition to the proper type and make the actual MPI call:

In mytypes.h:

typedef intptr_t my_MPI_Datatype;
typedef intptr_t my_MPI_Comm;

In mympi.h:

#include "mytypes.h"

// Redefine all MPI handle types
#define MPI_Datatype my_MPI_Datatype
#define MPI_Comm     my_MPI_Comm

// Those hold the actual values of some MPI constants
extern MPI_Comm     my_MPI_COMM_WORLD;
extern MPI_Datatype my_MPI_INT;

// Redefine the MPI constants to use our symbols
#define MPI_COMM_WORLD my_MPI_COMM_WORLD
#define MPI_INT        my_MPI_INT

// Redeclare the MPI interface
extern int MPI_Send(void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm);

In mpiwrap.c:

#include <mpi.h>
#include "mytypes.h"

my_MPI_Comm my_MPI_COMM_WORLD;
my_MPI_Datatype my_MPI_INT;

int MPI_Init(int *argc, char ***argv)
{
   // Initialise the actual MPI implementation
   int res = PMPI_Init(argc, argv);
   my_MPI_COMM_WORLD = (intptr_t)MPI_COMM_WORLD;
   my_MPI_INT = (intptr_t)MPI_INT;
   return res;
}

int MPI_Send(void *buf, int count, intptr_t datatype, int dest, int tag, intptr_t comm)
{
   return PMPI_Send(buf, count, (MPI_Datatype)datatype, dest, tag, (MPI_Comm)comm);
}

In your code:

#include "mympi.h" // instead of mpi.h

...
MPI_Init(NULL, NULL);
...
MPI_Send(buf, 10, MPI_INT, 1, 10, MPI_COMM_WORLD);
...

The MPI wrapper can either be linked statically or preloaded dynamically. Both ways work as long as the MPI implementation uses weak symbols for the PMPI interface. You can extend the above code example to cover all the MPI functions and constants used. All constants should be saved in the wrapper of MPI_Init / MPI_Init_thread.

Handling MPI_Status is somehow convoluted. Although the standard defines the public fields, it doesn't say anything about their order or their placement within the structure. And once again, MPICH and Open MPI differ significantly:

// MPICH (Intel MPI)
typedef struct MPI_Status {
    int count_lo;
    int count_hi_and_cancelled;
    int MPI_SOURCE;
    int MPI_TAG;
    int MPI_ERROR;
} MPI_Status;

// Open MPI
struct ompi_status_public_t {
    /* These fields are publicly defined in the MPI specification.
       User applications may freely read from these fields. */
    int MPI_SOURCE;
    int MPI_TAG;
    int MPI_ERROR;
    /* The following two fields are internal to the Open MPI
       implementation and should not be accessed by MPI applications.
       They are subject to change at any time.  These are not the
       droids you're looking for. */
    int _cancelled;
    size_t _ucount;
};

If you only use MPI_Status to get information out of calls such as MPI_Recv, then it is trivial to copy the three public fields into a user-defined static structure containing only those fields. But that won't suffice if you are also using MPI functions that read the non-public ones, e.g. MPI_Get_count. In that case, a dumb non-OO approach is to simply embed the original status structure:

In mytypes.h:

// 64 bytes should cover most MPI implementations
#define MY_MAX_STATUS_SIZE 64

typedef struct my_MPI_Status
{
   int MPI_SOURCE;
   int MPI_TAG;
   int MPI_ERROR;
   char _original[MY_MAX_STATUS_SIZE];
} my_MPI_Status;

In mympi.h:

#define MPI_Status        my_MPI_Status
#define MPI_STATUS_IGNORE ((my_MPI_Status*)NULL)

extern int MPI_Recv(void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm, MPI_Status *status);
extern int MPI_Get_count(MPI_Status *status, MPI_Datatype datatype, int *count);

In mpiwrap.c:

int MPI_Recv(void *buf, int count, my_MPI_Datatype datatype, int dest, int tag, my_MPI_Comm comm, my_MPI_Status *status)
{
   MPI_Status *real_status = (status != NULL) ? (MPI_Status*)&status->_original : MPI_STATUS_IGNORE;
   int res = PMPI_Recv(buf, count, (MPI_Datatype)datatype, dest, tag, (MPI_Comm)comm, real_status);
   if (status != NULL)
   {
      status->MPI_SOURCE = real_status->MPI_SOURCE;
      status->MPI_TAG = real_status->MPI_TAG;
      status->MPI_ERROR = real_status->MPI_ERROR;
   }
   return res;
}

int MPI_Get_count(my_MPI_Status *status, my_MPI_Datatype datatype, int *count)
{
   MPI_Status *real_status = (status != NULL) ? (MPI_Status*)&status->_original : MPI_STATUS_IGNORE;
   return PMPI_Get_count(real_status, (MPI_Datatype)datatype, count);
}

In your code:

#include "mympi.h"

...
MPI_Status status;
int count;

MPI_Recv(buf, 100, MPI_INT, 0, 10, MPI_COMM_WORLD, &status);
MPI_Get_count(&status, MPI_INT, &count);
...

Your build system should then check if sizeof(MPI_Status) of the actual MPI implementation is less than or equal to MY_MAX_STATUS_SIZE.

The above is just a quick and dirty idea - haven't tested it and some const or casts might be missing here or there. It should work in practice and be pretty maintainable.

Thanks, Hristo Iliev. I need to get a better understanding of what PMPI does and if it's a dependency that we may impose on the customer. It seems it's "native" to each MPI implementation, so this should work. I'm worried about performance penalties. Is performance going to take a hit by using PMPI, which my understanding is that it was originally designed for profiling purposes? — blue scorpion, Jul 19 '16 at 15:12
Because PMPI was designed for profiling, it's impact is really minimal. The overhead is just one function call plus the argument casts (which on x86-64 should be really minimal given the ABI). How much could it possibly hit the performance when most MPI calls, especially the communication ones, tend to take far longer time to execute? Of course, it is better to extract larger blocks of functionality, e.g. halo exchange, data shift, etc. — Hristo Iliev, Jul 19 '16 at 17:54
Btw, PMPI is really simple. It basically means that (when possible) all `MPI_Bla_bla` symbols are weak aliases of the "true" MPI functions `PMPI_Bla_bla` and as such can be overridden by other code providing symbols with the same name. It does not introduce any additional dependencies besides PMPI support in the library (virtually all vendors enable it by default). But, as I said at the very top of my answer, this is an abuse of PMPI and is here just to give you (and anyone else interested) a different way to approach the problem with MPI's own mechanisms. — Hristo Iliev, Jul 19 '16 at 18:00
I haven't found a "nuts and bolts" info web page for PMPI, yet. My question is: would PMPI be offered _along_ with a given MPI implementation? Say I want to use MVAPICH GDR (http://mvapich.cse.ohio-state.edu/userguide/gdr/), is the corresponding MVAPICH PMPI _guaranteed_ to use the underlying MPI GDR? In other words, is the MPI <-> PMPI mapping _guaranteed_ to be 1-to-1? — blue scorpion, Jul 19 '16 at 19:16
PMPI is not a separate layer but _part_ of the implementation. A PMPI-enabled implementation has all its exported symbols named `PMPI_...` instead of `MPI_...`, then `MPI_...` are provided as a weak aliases to `PMPI_...`. When you link against such library, any call to `MPI_...` in your code is actually a call to `PMPI_...` as both symbols have the same address. Since the `MPI_...` symbols are weak, they can be replaced by user functions with the same name. The user code can then access the MPI library via the `PMPI_...` symbols. — Hristo Iliev, Jul 20 '16 at 07:59
I don't know how to better explain it... Use `nm libmpi.a` or `objdump -T libmpi.so` on the MVAPICH libraries to see for yourself. Global text symbols (exported functions) are marked with `T` (`nm`) or `g` (`objdump`), weak symbols as `W` (`nm`) or `w` (`objdump`). You can also take a stack trace of a running MPI executable and see that most debuggers will actually show `PMPI_Name` instead of `MPI_Name` in the trace. I believe that should convince you that the answer to your last question is: _yes, as long as the wrapper links against the same MPI library_. — Hristo Iliev, Jul 20 '16 at 08:21

score 2 · Answer 2 · edited Jul 19 '16 at 09:09

2

Considering that MPI is a well-defined API, you can easily provide both the header and the source code of the MPI wrapper. The customer simply needs to compile it against his MPI implementation, and you dynamically load that into your solver. There is no need for the client to implement anything.

In addition to the actual function wrapping, there are basically two things to consider:

As you already pointed out, structs may differ. So you have to wrap them. In particular, you need to consider the size of these structs, so you cannot allocate them in your solver code. I would make a case for C++, because you can use RAII.
Return codes, MPI_Datatype and other macros / enums. I would make another case for C++, because it's natural to convert return codes to exceptions.

header

// DO NOT include mpi.h in the header. Only use forward-declarations
struct MPI_Status;

class my_MPI_Status {
public:
    // Never used directly by your solver.
    // You can make it private and friend your implementation.
    MPI_Status* get() { return pimpl.get(); }
    int source() const;
    ... tag, error
private:
    std::unique_ptr<MPI_Status> pimpl;
}

class my_MPI_Request ...

source

#include <mpi.h>

static void handle_rc(int rc) {
    switch (rc) {
        case MPI_SUCCESS:
            return;
        case MPI_ERR_COMM:
            throw my_mpi_err_comm;
        ...
    }
}

// Note: This encapsulates the size of the `struct MPI_Status`
// within the source. Use `std::make_unique` if available.
my_MPI_Status::my_MPI_Status() : pimpl(new MPI_Status) {}
int my_MPI_Status::source() const {
    return pimpl->MPI_SOURCE;
}

void my_MPI_Wait(my_MPI_Request request, my_MPI_Status status) {
    handle_rc(MPI_Wait(request.get(), status.get());
}

Note that the number of arguments for each MPI function is well defined in the MPI standard. There is no need to adapt that.

edited Jul 19 '16 at 09:09

jxh

69,070
8
110
193

answered Jul 19 '16 at 07:13

Zulan

21,896
6
49
109

This is really the Bridge Pattern, where the code to the **ConcreteImplemetor** is provided by the solver. But, upvoted none-the-less. – jxh Jul 19 '16 at 09:12
Thanks, Zulan. Yes, forward declarations should be enough provided our solver doesn't use them in our code. The problem is that our solver code is riddled with MPI_Comm, MPI_Request, and MPI_Status variables. I will clean some of them, but I cannot clean all (e.g., we maintain some vector of MPI_Request's). The only way around that is to define proxy pointers for each (say, BaseMPIRequest*) that the impl overrides and sets accordingly via some setter functions. So, fwd decl are not enough, unfortunately... – blue scorpion Jul 19 '16 at 14:44
Basically, if you want to make sure there are no conflicts, you must not ever have `mpi.h` included in your solver (directly or indirectly). If you code is riddled with legacy MPI, the `#define` & PMPI solution by Hristo Iliev may be less intrusive to the solver. – Zulan Jul 19 '16 at 14:55
Yes, Hristo Iliev's solution sounds great. Again, I'm worried about PMPI penalties. Good news is that Zulan is right: once we have a wrapper, each customer can use it, they just have to compile-build it into a shared library against their own MPI. So, the customer doesn't need to write their own wrapper. They just need to re-compile the wrapper we give them. Thank you. – blue scorpion Jul 19 '16 at 15:53
1

@bluescorpion, I don't get your penalty worries. You want to build a wrapper anyway. Whether you will name your wrapper function `my_mpi_send` and have it call `MPI_Send`, and then replace all calls to `MPI_Send` in your code with calls to `my_mpi_send` or you will name the wrapper function `MPI_Send` and have it call `PMPI_Send` (which is the true name of `MPI_Send` in the MPI library), and don't touch the solver code at all, the overhead is *exactly the same*. It is simply a matter of how the functions in the wrapper are named. – Hristo Iliev Jul 20 '16 at 08:28

score 1 · Accepted Answer · edited May 23 '17 at 12:22

1

This seems to be an obvious use case for the Bridge Pattern.

In this case the generic interface for MPI is the Implementor. The customer is expected to provide the ConcreteImplementor for their specific MPI instance. Your solver code would be the RefinedAbstraction as the Abstraction provides the bridge to the Implementor.

Abstract_Solver <>--> MPI_Interface
      .                    .
     /_\                  /_\
      |                    |

    Solver            MPI_Instance

The customer inherits from MPI_Interface and implements it against its MPI instance of choice. The implementation is then fed to the solver interface and used by the Abstract_Solver as it is doing its work.

Thus, you can make MPI_Interface as type safe as necessary for Abstract_Solver to get its work done. No void * is necessary. The implementor ofMPI_Instance can store whatever implementation specific MPI state it needs within its instantiated object that would be required to fulfill the contract required by the interface. As an example, the comm argument could be elided from the MPI_Interface. The interface could just assume a separate comm would require a separate instance of MPI_Instance (initialized to a different comm).

While the Bridge Pattern is object-oriented, this solution is not limited to C++. You can easily specify an abstract interface in C (as seen in this dynamic dispatching example).

edited May 23 '17 at 12:22

Community

1
1

answered Jul 18 '16 at 18:35

jxh

69,070
8
110
193

Thank you, jxh. This looks straightforward. – blue scorpion Jul 18 '16 at 18:38
I think, elision of arguments like comm is the only feasible way, actually. Let's say in my solver I have: func(...){ MPI_Comm my_comm=...; MPI_Send(...,my_comm); } But when replacing with stub calls, my_comm _definition_ must be replaced by a proxy wrapper. This won't work, because the wrapper type doesn't have the necessary data members; it's just a (necessarily non-abstract) "empty" base class. So, MPI management types like MPI_Request, MPI_Comm cannot be visible to the wrapper interface, only the implementation can see them. Am I wrong, here? – blue scorpion Jul 18 '16 at 20:09
1

It is probably easiest to force your customer to properly fulfill the contract, rather than you take arbitrary opaque arguments that would be too easy to get wrong. – jxh Jul 18 '16 at 20:22
This answer ignores the fact, that MPI is a well-defined API. There is no need to have the customer implement anything. – Zulan Jul 19 '16 at 06:33
@Zulan: Perhaps the asker can provide a template `MPI_Instance` that allows the customer to avoid their own implementation. However, this solution allows the customer the opportunity to use the solver in an environment that has something different from MPI. – jxh Jul 19 '16 at 08:05

How to write an MPI wrapper for dynamic loading

3 Answers3

header

source

Linked