0

I have a big trouble while writing some data to files using MPI on a cluster with PBS. Here is the example of simple problem-emulating programm.

#include <mpi.h>

#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>


#include <iostream>
#include <fstream>
#include <string>
#include <sstream>
#include <cstdlib>
#include <unistd.h>

int main(int argc, char* argv[]){
int rank;
int size;

MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);


// Define hostname
char hostname[128];
gethostname(hostname, 128);

// check and create dump directory
  struct stat buf;
  int rc;
  char *dir="Res";

  rc = stat( dir, &buf );
  if( rc ) // no dir, create
  { if( rank == 0 )
    {
      rc = mkdir( dir, 0771);
      if( rc )
      {std::ostringstream oss;
       oss << "Can't create dump directory \""
          << dir
          << "\"";
      }
    }
    else {
       sleep (2);
    }
  }
  else if( !S_ISDIR( buf.st_mode ) )
  {std::ostringstream oss;
   oss << "Path \""
       << dir
       << "\" is not directory for dump";
  }


   MPI_Barrier(MPI_COMM_WORLD);
// Every process defines name of file for output (res_0, res_1, res_2.....)
std::ostringstream filename;
filename << dir << "/res_"<< rank;

// Open file 
std::ofstream file(filename.str().c_str());

// Output to file . Output seems like "I am 0 from 24. hostname"
file  << "I am " << rank << " from " << size << ".   " << hostname  << std::endl;

file.close();

MPI_Finalize();

return 0;
}

I compile it with openmpi_intel-1.4.2, using comand

mpicxx -Wall test.cc -o test

Then I queue this program with script:

#!/bin/bash

#PBS -N test
#PBS -l select=8:ncpus=6:mpiprocs=6
#PBS -l walltime=00:01:30
#PBS -m n
#PBS -e stderr.txt
#PBS -o stdout.txt

cd $PBS_O_WORKDIR
echo "I run on node: `uname -n`"
echo "My working directory is: $PBS_O_WORKDIR"
echo "Assigned to me nodes are:"
cat $PBS_NODEFILE

mpirun -hostfile $PBS_NODEFILE ./test 

I expected this result:

1. New directory "Res" to be created

2. 8*6 different files (res_0, res_1, res_2, ...) to be written to the Res dir

But only res_* file from the first node are written (res_{0..5}) while the rest are not.

What is the problem?

Thank you!

Krishal
  • 43
  • 8
  • 1
    Do all of the nodes have access to the same file location? – dbeer Aug 17 '15 at 17:53
  • @dbeer can be right. Imagine that each node has a different volume named `/scratch`. The new folder `Res` is created on the node of the process 0 but not on the other nodes. As a consequence, `file(filename.str().c_str());` fails with reason `no such file or directory`. Could you use [`ios::fail()`](http://www.cplusplus.com/reference/ios/ios/fail/) like http://stackoverflow.com/questions/5835848/when-will-ofstreamopen-fail ? Or use exceptions like http://codereview.stackexchange.com/questions/57829/better-option-than-errno-for-file-io-error-handling to check that ? – francis Aug 17 '15 at 19:07
  • Hello guys. Server Master sad the problem is in time needed to create files. The program works very slow with "w" files at the first time, so it's crucial to check if the 0-node has already created folder and wait till this code line is done. He made an example that java first-time loading takes about 10 mins, while the next loadings take about a min. – Krishal Aug 18 '15 at 08:11
  • What's in the logs? If the directory was not created in 2 secs, you should have had `is not directory for dump` from the other ranks in your stdout.txt. – Dima Chubarov Aug 18 '15 at 13:51

1 Answers1

1

OK, let's assume you run on a file system coherently mounted across all your compute nodes. This is the case, right? So then the main issue I see with your code snippet is that all processes do state the directory at the same time and then try to create it if it doesn't exist. I'm not sure what truly happens but I'm sure this isn't the smartest idea ever.

Since in essence what you want is a serial sanity check of the directory and/or it's creation if needed, why not just letting MPI process of rank 0 doing it?

That would give you something like this:

if ( rank == 0 ) { // Only master manages the directory creation
    int rc = stat( dir, &buf );
    ... // sanity check goes here and directory creation as well
    // calling MPI_Abort() in case of failure seems also a good idea
}
// all other processes wait here
MPI_Barrier( MPI_COMM_WORLD );
// now we know the directory exists and is accessible
// let's do our stuff

Could this work for you?

Gilles
  • 9,269
  • 4
  • 34
  • 53