0

I'm trying to write several long distributed arrays to a single file using MPI-I/O (OpenMPI implementation) with shared file pointer. I get the following error messages

lseek:Invalid argument

WRITE FAILED

I prepared a simplified code snippet to reproduce the issue.

        long long globalUpperBnd = 2200000000;// more than size of int
        long long average = globalUpperBnd/commSize;
        long long length = (commRank == commSize-1) ? globalUpperBnd-(average*commRank) : average;
        char *buf = new char[length];
        ... // fill the buffer

        MPI_File file;
        MPI_File_open(comm, "test.bin", MPI_MODE_CREATE|MPI_MODE_WRONLY, MPI_INFO_NULL, &file);

        MPI_File_set_view(file, 0, MPI_BYTE, MPI_BYTE, "native", MPI_INFO_NULL);
        MPI_File_write_ordered(file, buf, length, MPI_BYTE, MPI_STATUS_IGNORE);
        // here I got an error message
        MPI_File_write_ordered(file, buf, length, MPI_BYTE, MPI_STATUS_IGNORE);

        MPI_File_close(&file);

        delete []buf;

It looks like MPI_Offset is just int and the 2nd call of MPI_File_write_ordered causes MPI_Offset overflow, the offset becomes negative. It's quite interesting that writing of the same amount of data can be done successfully by multiplying globalUpperBnd by 2 and calling MPI_File_write_ordered only one time. So it looks like MPI_File_write_ordered avoids offset overflow somehow.

I use 64-bit OpenMPI library.

Is there any workaround for this case?

Community
  • 1
  • 1
  • Can you please upload a [MCVE] ? Note `length` should be an `int` and you should use a derived datatype if it might overflow. Also, which Open MPI version are you running ? – Gilles Gouaillardet Oct 23 '18 at 10:33

1 Answers1

0

I think the workaround is to upgrade your MPI implementation. Both OpenMPI and MPICH have been working on these kinds of "huge I/O" bugs -- shared file pointers don't get a lot of attention but I think the last year or so of bug fixes should take care of this.

Rob Latham
  • 5,085
  • 3
  • 27
  • 44
  • before hitting a potential bug in the MPI I/O implementation, the OP should fix his code (e.g do not pass a `long long` when MPI I/O expects an `int`) – Gilles Gouaillardet Nov 30 '18 at 04:24