2

What is an optimal way to send OPENCV Mat over MPI? Now I have done it by convetring Mat to int** but this is a bit slow solution.

A = alloc2d(n , m);
for (int i = 0; i < n ; ++i)
    for (int j = 0; j < m ; ++j)
        A[i][j] = img.at<uchar>(i , j);

/////////////////////////////////////
int ** alloc2d(int rows, int cols) {
    int * data = (int *)malloc(rows * cols * sizeof(int));
    int ** arr = (int **)malloc(rows * sizeof(int *));
    for (int i = 0; i < rows; ++i)
        arr[i] = &(data[cols * i]);
    return arr;
}
RicUfa
  • 21
  • 4

1 Answers1

4

Check the original Mat is contiguous first, and clone it if it isn't.

Then just get the:

  • rows
  • columns
  • type
  • channels

of the original Mat and save in that order, each as 4 bytes, at the start of a buffer. Then append the appropriate number of bytes from the original Mat's data pointer and send the whole lot.

Do the opposite at the receiving end... read the first four integers from the buffer and create a Mat of the corresponding size and load the remainder of the data into it.


@Miki provides an excellent, related answer here which demonstrates the details of most of the techniques suggested above - look specifically at Mat2str() and str2Mat().

I don't do much C++ or much MPI, I am sure anyone who uses MPI or C++ a lot could tighten it up, but the following works and works pretty fast too!

#include <cstdlib>
#include <iostream>
#include <iomanip>
#include <ctime>
#include <iostream>
#include <string>
#include <chrono>
#include <thread>
#include <opencv2/opencv.hpp>
#include "opencv2/highgui/highgui.hpp"
#include "mpi.h"

using namespace std;
using namespace cv;

const int MAXBYTES=8*1024*1024;
uchar buffer[MAXBYTES];

void matsnd(const Mat& m,int dest){
      int rows  = m.rows;
      int cols  = m.cols;
      int type  = m.type();
      int channels = m.channels();
      memcpy(&buffer[0 * sizeof(int)],(uchar*)&rows,sizeof(int));
      memcpy(&buffer[1 * sizeof(int)],(uchar*)&cols,sizeof(int));
      memcpy(&buffer[2 * sizeof(int)],(uchar*)&type,sizeof(int));

      // See note at end of answer about "bytes" variable below!!!
      int bytespersample=1; // change if using shorts or floats
      int bytes=m.rows*m.cols*channels*bytespersample;
cout << "matsnd: rows=" << rows << endl;
cout << "matsnd: cols=" << cols << endl;
cout << "matsnd: type=" << type << endl;
cout << "matsnd: channels=" << channels << endl;
cout << "matsnd: bytes=" << bytes << endl;

      if(!m.isContinuous())
      { 
         m = m.clone();
      }
      memcpy(&buffer[3*sizeof(int)],m.data,bytes);
      MPI_Send(&buffer,bytes+3*sizeof(int),MPI_UNSIGNED_CHAR,dest,0,MPI_COMM_WORLD);
}

Mat matrcv(int src){
      MPI_Status status;
      int count,rows,cols,type,channels;

      MPI_Recv(&buffer,sizeof(buffer),MPI_UNSIGNED_CHAR,src,0,MPI_COMM_WORLD,&status);
      MPI_Get_count(&status,MPI_UNSIGNED_CHAR,&count);
      memcpy((uchar*)&rows,&buffer[0 * sizeof(int)], sizeof(int));
      memcpy((uchar*)&cols,&buffer[1 * sizeof(int)], sizeof(int));
      memcpy((uchar*)&type,&buffer[2 * sizeof(int)], sizeof(int));

cout << "matrcv: Count=" << count << endl;
cout << "matrcv: rows=" << rows << endl;
cout << "matrcv: cols=" << cols << endl;
cout << "matrcv: type=" << type << endl;

      // Make the mat
      Mat received= Mat(rows,cols,type,(uchar*)&buffer[3*sizeof(int)]);
      return received;
}

int main ( int argc, char *argv[] )
{
   // Initialise MPI
   MPI::Init (argc,argv);

   //  Get our rank
   int id = MPI::COMM_WORLD.Get_rank();
   if(id==0) 
   {
      // MASTER - wait to receive image from slave and write to disk for checking
      Mat received=matrcv(1);
      imwrite("received.jpg",received);
   }else{
      // Slave - read Mat from disk and send to master
      Mat image=imread("image.jpg",IMREAD_COLOR);
      matsnd(image,0);
   }

   //  Terminate MPI
   MPI::Finalize();
}

I put a loop with 10,000 iterations around:

  • matsnd() in the slave, and
  • matrcv() in the Master

and it took 1.9 seconds for the 10,000 iterations. I cannot compare as you didn't show any timings.

All the cout statements that are hard-left justified are just debug stuff that can safely be removed.

Note:

Whilst I have used and tested the above, I have since learned that the calculation of the number of bytes I send may be incorrect in some circumstances (probably where there are alignment constraints). If you are interested, please check this answer.

Keywords: MPI, MPI_Send, MPI_Recv, OpenCV, Mat, image

Community
  • 1
  • 1
Mark Setchell
  • 191,897
  • 31
  • 273
  • 432
  • I saw a related answer just the other day by @Miki (I think) but can't find it for the moment and it's late so I'll hunt again tomorrow and add a link if I can. – Mark Setchell Apr 19 '17 at 21:44
  • 1
    If the matrix is big, it will be cheaper to register an MPI structure datatype with several elements and use it to directly send both the descriptor and the data without the need to pack everything in a single array. On the receiver side, `MPI_Probe` and `MPI_Get_count` can be used to determine the size of the data and used to preallocate the matrix. – Hristo Iliev Apr 20 '17 at 11:05
  • 1
    Here is a piece of code I wrote for a similar question http://stackoverflow.com/questions/28782951/segmentation-fault-while-using-mpi-and-opencv-together/28794709#28794709 sending the size and the buffer pointed by `im.data` It might inspire you... – francis Apr 20 '17 at 17:16