MPI Binary File I/O Basic Function and Performance Questions

Question

TLDR

For loop hangs when I make files in parallel. Why? (see code below) Also, what's a safe/efficient way to write to multiple binary files (pointer and offset determined by iteration variable)?

Context and questions:

What I would like my code to do is the following:

(1) All processes read a single binary file containing a matrix of doubles -> already achieved this using MPI_File_read_at()

(2) For each 'column' of input data, perform calculations using the numbers in each 'row', and save the data for each column into its own binary output file ("File0.bin" -> column 0)

(3) To enable the user to specify an arbitrary number of processes, I use simple indexing to treat the matrix as one long (rows)X(cols) vector, and split that vector by the number of processes. Each process gets (rows)X(cols)/tot_proc of entries to process... using this approach, the columns will not be neatly divided by each process, therefore, each process needs to access whatever file(s) correspond to it and, using proper offsets, write to the correct section of the correct file. At the moment, it does not matter that the resulting file will be fragmented.

As I work toward that goal, I have written a short program to create binary files in a loop, but the loop hangs on the very last iteration (13 files divided over 4 processes). Number of files to create = (rows).

Question 1 Why does this code hang at the very end of the loop? In my toy example of 4 processes, id_proc 1-3 have 3 files to create, while id_proc 0 (the root process) has 4 files to create. The loop hangs when the root process tries to make it's 4th file. Note: I'm compiling this on a laptop running Ubuntu using mpic++.

Question 2 Eventually I will add a second for loop just like the one you see below, except in this loop, the process must write to the appropriate section of the binary files that have already been created. I plan to use MPI_File_write_at() to do this, but I have also read that the files should be statically sized using MPI_File_set_size(), and then, every process should have it's own view of the file using MPI_File_set_view(). So, my question is, in order for this to work, should I do the following?

(Loop 1) MPI_File_open(...,MPI_WRONLY | MPI_CREATE,...), MPI_File_set_size(), MPI_File_close()

(Loop 2) MPI_File_open(...,MPI_WRONLY,...), MPI_File_set_view(), MPI_File_write_at(), MPI_File_close()

.... Loop 2 seems like it will be slowed by having to open and close files each iteration, but I do not know in advance how much input data the user will provide, nor how many processes the user will provide. For example, Process N might need to write to the end of file 1, the middle of file 2, and the end of file 8. In principle, all of that can be taken care of with offsets. What I don't know is whether MPI allows for this level of flexibility or not.

Code attempting to create multiple files in parallel:

#include <iostream>
#include <cstdlib>
#include <stdio.h>
#include <vector>
#include <fstream>
#include <string>
#include <sstream>
#include <cmath>
#include <sys/types.h>
#include <sys/stat.h>
#include <mpi.h>

using namespace std;

int main(int argc, char** argv)
{
    //Variable declarations
    string oname;
    stringstream temp;
    int rows = 13, cols = 7, sz_dbl = sizeof(double);
    //each binary file will eventually have 7*sz_dbl bytes
    int id_proc, tot_proc, loop_min, loop_max;
    vector<double> output(rows*cols,1.0);//data to write

    //MPI routines
    MPI_Init(&argc,&argv);//initialize MPI
    MPI_Comm_rank(MPI_COMM_WORLD,&id_proc);//get "this" node's id#/rank
    MPI_Comm_size(MPI_COMM_WORLD,&tot_proc);//get the number of processors

    //MPI loop variable assignments
    loop_min = id_proc*rows/tot_proc + min(rows % tot_proc, id_proc);
    loop_max = loop_min + rows/tot_proc + (rows % tot_proc > id_proc);

    //File handle
    MPI_File outfile;

    //Create binary files in parallel
    for(int i = loop_min; i < loop_max; i++)
    {
        temp << i;
        oname = "Myout" + temp.str() + ".bin";
        MPI_File_open(MPI_COMM_WORLD, oname.c_str(), MPI_MODE_WRONLY | MPI_MODE_CREATE, MPI_INFO_NULL, &outfile);
        temp.clear();
        temp.str(string());
        MPI_File_close(&outfile);
    }
    MPI_Barrier(MPI_COMM_WORLD);//with or without this, same error

    MPI_Finalize();//MPI - end mpi run
    return 0;
}

Tutorial/information pages I've read so far:

http://beige.ucs.indiana.edu/B673/node180.html

http://beige.ucs.indiana.edu/B673/node181.html

http://mpi-forum.org/docs/mpi-2.2/mpi22-report/node305.htm

https://www.open-mpi.org/doc/v1.4/man3/MPI_File_open.3.php

http://www.mcs.anl.gov/research/projects/mpi/mpi-standard/mpi-report-2.0/node215.htm

Parallel output using MPI IO to a single file

Is it possible to write with several processors in the same file, at the end of the file, in an ordonated way?

The easiest approach is to use process id 0 as master and feed all IO through it. This won't scale to massive parallelisation, but it's likely you've only got a few processors. — Malcolm McLean, Jun 26 '17 at 11:16
@MalcolmMcLean Thank you for your comment. In my case, I do have access to several thousand processors so it is my hope that I'll be able to scale massively. — Eric Inclan, Jun 26 '17 at 21:28

score 1 · Answer 1 · answered Jun 26 '17 at 11:06

1

MPI_File_open() is a collective operation, that means that all tasks from MPI_COMM_WORLD must open the same file at the same time.

if you want to open one process per task, then use MPI_COMM_SELF instead.

answered Jun 26 '17 at 11:06

Gilles Gouaillardet

8,193
11
24
30

Thank you for your response. This fixed the bug in the code I displayed. By any chance, do you have any comments on Question 2? – Eric Inclan Jun 26 '17 at 21:29

MPI Binary File I/O Basic Function and Performance Questions

1 Answers1