20

It is clear that its arguments are:

int MPI_Type_create_subarray(
  int ndims,
  int array_of_sizes[],
  int array_of_subsizes[],
  int array_of_starts[],
  int order,
  MPI_Datatype oldtype,
  MPI_Datatype *newtype
);

However, I cannot understand how this method receives the original array which we want to split and where it returns the new subarray (as this method should return an integer). In other words, I simply would like to see a simple implementation of this method in C++, which I am not able to find on the Internet.

gsamaras
  • 71,951
  • 46
  • 188
  • 305
Pippo
  • 1,543
  • 3
  • 21
  • 40

1 Answers1

37

MPI_Type_create_subarray() neither takes an original array nor returns a subarray; it creates an MPI type which describes the memory layout of a subarray given: a larger array of some given type; a set of subsizes; and a "corner" from which to start.

You can then use this newly created MPI type to extract just the data you want from any appropriately-sized array and send it in a message to another task (with point-to-point message passing routines), all other tasks (via collectives), or write it to disk (with MPI-IO). In the following example, rank 0 uses an MPI subarray type to extract a subarray from a larger array of integers and sends it to rank 1. Rank 1, just receiving it into a contiguous buffer, doesn't need to receive it as any special type; it just receives the data as so many integers.

#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>

void printarr(int **data, int n, char *str);
int **allocarray(int n);

int main(int argc, char **argv) {

    /* array sizes */
    const int bigsize =10;
    const int subsize =5;

    /* communications parameters */
    const int sender  =0;
    const int receiver=1;
    const int ourtag  =2;

    int rank, size;

    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);

    if (size < receiver+1) {
        if (rank == 0)
            fprintf(stderr,"%s: Needs at least %d  processors.\n", argv[0], receiver+1);
        MPI_Finalize();
        return 1;
    }

    if (rank == sender) {
        int **bigarray = allocarray(bigsize);
        for (int i=0; i<bigsize; i++)
            for (int j=0; j<bigsize; j++)
                bigarray[i][j] = i*bigsize+j;


        printarr(bigarray, bigsize, " Sender: Big array ");

        MPI_Datatype mysubarray;

        int starts[2] = {5,3};
        int subsizes[2]  = {subsize,subsize};
        int bigsizes[2]  = {bigsize, bigsize};
        MPI_Type_create_subarray(2, bigsizes, subsizes, starts,
                                 MPI_ORDER_C, MPI_INT, &mysubarray);
        MPI_Type_commit(&mysubarray);

        MPI_Send(&(bigarray[0][0]), 1, mysubarray, receiver, ourtag, MPI_COMM_WORLD);
        MPI_Type_free(&mysubarray);

        free(bigarray[0]);
        free(bigarray);

    } else if (rank == receiver) {

        int **subarray = allocarray(subsize);

        for (int i=0; i<subsize; i++)
            for (int j=0; j<subsize; j++)
                subarray[i][j] = 0;

        MPI_Recv(&(subarray[0][0]), subsize*subsize, MPI_INT, sender, ourtag, MPI_COMM_WORLD, MPI_STATUS_IGNORE);

        printarr(subarray, subsize, " Receiver: Subarray -- after receive");

        free(subarray[0]);
        free(subarray);
    }

    MPI_Finalize();
    return 0;
}

void printarr(int **data, int n, char *str) {    
    printf("-- %s --\n", str);
    for (int i=0; i<n; i++) {
        for (int j=0; j<n; j++) {
            printf("%3d ", data[i][j]);
        }
        printf("\n");
    }
}

int **allocarray(int n) {
    int *data = malloc(n*n*sizeof(int));
    int **arr = malloc(n*sizeof(int *));
    for (int i=0; i<n; i++)
        arr[i] = &(data[i*n]);

    return arr;
}

Running this gives

$ mpicc -o subarray subarray.c  -std=c99 -Wall -g
$ mpirun -np 2 ./subarray
    --  Sender: Big array  --
  0   1   2   3   4   5   6   7   8   9 
 10  11  12  13  14  15  16  17  18  19 
 20  21  22  23  24  25  26  27  28  29 
 30  31  32  33  34  35  36  37  38  39 
 40  41  42  43  44  45  46  47  48  49 
 50  51  52  53  54  55  56  57  58  59 
 60  61  62  63  64  65  66  67  68  69 
 70  71  72  73  74  75  76  77  78  79 
 80  81  82  83  84  85  86  87  88  89 
 90  91  92  93  94  95  96  97  98  99 
--  Receiver: Subarray -- after receive --
 53  54  55  56  57 
 63  64  65  66  67 
 73  74  75  76  77 
 83  84  85  86  87 
 93  94  95  96  97
KeyC0de
  • 4,728
  • 8
  • 44
  • 68
Jonathan Dursi
  • 50,107
  • 9
  • 127
  • 158
  • 3
    +1 for the nice example. Use `MPI_Sendrecv` and save some code. – Hristo Iliev Nov 12 '12 at 16:17
  • I find new users get confused when one's sendrecv()ing to one's self, but maybe introducing nonblocking communications is no better here. Probably clearest would have been to just send to another rank. I'll leave it as is for now, but if it causes questions I'll do something else. – Jonathan Dursi Nov 12 '12 at 16:21
  • Ah, shoot, just noticed that I didn't MPI_Type_free. As long as I have to change it anyway... – Jonathan Dursi Nov 12 '12 at 16:52
  • @JonathanDursi Thank you very much for your great answer! Sorry if I wasn't able to reply earlier. – Pippo Nov 12 '12 at 18:47
  • The example is clear. Just two questions: 1) What is the exact meaning of **? – Pippo Nov 12 '12 at 19:00
  • 2) If I understood correctly, `MPI_Type_create_subarray` returns a sort of pointer to a certain portion of an array. Suppose we have a matrix, which we would like to evolve passing each block to a thread (e.g., to compute the diffusion in 2D). In this way, there should be no need of `MPI_Isend` and `MPI_Irecv`: we could simply pass a "moving" subarray to each thread, compute its evolution, wait for every thread, and finally "free" the original matrix. Is this right? – Pippo Nov 12 '12 at 19:03
  • @Pippo - `**`, in lines like `int **bigarray = ...` means that `bigarray` is a pointer to a pointer to some data type; this is the somewhat awkward way C allows you to make multi-dimensional arrays, dereferencing bigarray twice (eg, `bigarray[1][3]`) to get an int. That's not something I'll be able to explain in the confines of a content, but you can search for "C multidimensional arrays" for explanations. As to the other question, I'm not quite sure I understand what you're trying to do. By `thread` do you mean MPI tasks (which generally aren't threads?) – Jonathan Dursi Nov 12 '12 at 19:28
  • @JonathanDursi Thank you for your answer. Yes, by threads I mean MPI tasks. :) Sorry for the misleading word, but in what I would like to do (the computation of diffusion on a bidimensional grid), they should be threads, as to compute the value of each point one just needs to know its immediate sorroundings. Hence, I would like to pass to each node a block of the diffusion matrix and compute the values of its evolution. – Pippo Nov 12 '12 at 19:47
  • There is the problem of the so called "ghost rows": if we divide the matrix in blocks, each one constituted by a certain number of whole rows, we have to pay attention to consider two other rows in addition, one above and the other below the block, which help to compute the values of the block but which should not be computed themselves (as they have no surroundings). But at least I have already solved this problem. :) – Pippo Nov 12 '12 at 19:48
  • So yes, you can (certainly) use `mpi_type_create_subarray` to send and receive halo zones / ghost rows. You'd create subarray types to describe the "real" updated data you were sending to your up/down/left/right neighbours, and to describe the ghost rows you were receiving in to, and send and receive them as in the example above. But you still have to do the sending and receiving. The subarray type you create doesn't do any communication, nor does it generate any pointers or anything; it just describes a memory layout which allows you to send/receive directly into your grid. – Jonathan Dursi Nov 12 '12 at 20:21
  • @JonathanDursi Sorry if I bother you again, but I did not understand how to do the sending and receiving. I mean, should I send the block matrix from a "master" rank (which could be 0) to the other ranks (maybe through an `if... else` structure) and then receiving the results to the master? – Pippo Nov 12 '12 at 21:14
  • If you're trying to distribute blocks from an array from one one task to all the others using subarrays, you can take a look at [this question and answer](http://stackoverflow.com/questions/5585630/mpi-type-create-subarray-and-mpi-gather); if you're just trying to use subarrays to deal with ghost row exchanges, I think it'd be easiest to post that as a separate question, describing clearly what you want to do and what you're having problems with. – Jonathan Dursi Nov 12 '12 at 21:17
  • @JonathanDursi Thank you for your help. At the end, I managed to run my code. :) – Pippo Nov 13 '12 at 20:35
  • Sendrecv should be equivalent to Isend+Irecv+Waitall(2). It's less prone to deadlock than Send+Recv in the hands of novice MPI programmers. Personally, I only use Isend+Irecv, because it isn't limited to pairwise communication the way Sendrecv is. – Jeff Hammond Jan 02 '16 at 19:10
  • @JonathanDursi What if we send subarrays with elements of normal types, eg. MPI_CHAR and receive into a big array using the new mpi type? ie. Exchanging the custom type & normal type send and receive operations. I suppose there's not problem in doing that right? – KeyC0de Feb 26 '17 at 23:37
  • @RestlessC0bra - Yes, that's fine too. The sending and receiving data types have to be consistent, but needn't be the same. That is, if one specifies (say) 100 MPI_INTs, the other must too, but the second can describe a different memory layout for those 100 ints than the first. – Jonathan Dursi Feb 27 '17 at 13:40
  • 1
    @RestlessC0bra Yes. In the standard, types being consistent means the amount and underlying type (e.g. ,MPI_INT) of data has to be the same but the layout can differ. – Jonathan Dursi Feb 27 '17 at 16:20