7

I recently came across a bit of not-well-tested legacy code for writing data that's distributed across multiple processes (these are part of an MPI-based parallel computation) into the same file. Is this actually guaranteed to work?

It goes like this:

  • All processes open the same file for writing.

  • Each process calls fseek to seek to a different location within the file. This position may be past the end of the file.

  • Each process then writes a block of data into the file with fwrite. The seek locations and block sizes are such that these writes completely tile a section of the file -- no gaps, no overlaps.

Is this guaranteed to work, or will it sometimes fail horribly? There is no locking to serialize the writes, and in fact they are likely to be starting from a synchronization point. On the other hand, we can guarantee that they are writing to different file positions, unlike other questions which have had issues with trying to write to the "end of the file" from multiple processes.

It occurs to me that the processes may be on different machines that mount the file via NFS, which I suspect probably answers my question -- but, would it work if the file is local?

DigitalRoss
  • 143,651
  • 25
  • 248
  • 329
Brooks Moses
  • 9,267
  • 2
  • 33
  • 57
  • Would the OS even allow more than one file handle open for writing on the same file? Is there any reason they need to be the same file and not pieced together later? – Nathan Wiebe May 12 '12 at 18:46
  • 1
    In theory it should work if no two or more processes try to write past the end of file, i.e. if you preallocate all the necessary space, all the writes should be successful. I'm not sure what will happen if two processes try to write past the EOF and the file gets extended (and newly created space filled with zeros). – Hristo Iliev May 12 '12 at 19:43
  • I use HDF5/MPIO for such tasks, HDF5 is very useful for dimensional data and provides easy parallel access. – Anycorn May 14 '12 at 00:27

1 Answers1

4

I believe this will typically work but there is no guarantee that I can find. The Posix specifications for fwrite(3) defer to ISO C and neither standard mentions concurrency.

So I suspect it will typically work, but fseek(3) and fwrite(3) are buffered I/O functions, so success will depend on internal details of the library implementation. So, absolutely no guarantee but various reasons to expect that it will work.

Now, should the program use lseek(2) and write(2) then I believe you could consider the results guaranteed, but now it's restricted to Posix operating systems.

One thing seems ... odd ... why would an MPI program decide to share its data via NFS and not the message API? It would seem slower, less portable, more prone to trouble, and generally just a waste of the MPI feature set. It certainly is no more distributed given the reliance on a single NFS server.

DigitalRoss
  • 143,651
  • 25
  • 248
  • 329
  • The program might want to write its distributed data (e.g. a large matrix scattered throughout the processes) to a non-volatile storage. It's such a common operation in HPC that the MPI standard provides its own parallel I/O API for that purpose. – Hristo Iliev May 12 '12 at 19:39
  • Lately I'd did unlocked/unsynchronised, parallel `write()`ing to one file, using multiple threads, to non-overlapping parts of the file, including seeking beyond EOF, without any issues. – alk May 13 '12 at 12:56
  • Thanks! FWIW, I figured that your last paragraph was probably the real-world right question -- this is a bit of legacy library functionality that exists because someone six years ago thought a user might want it. Between the "no guarantee" and the "this is a very strange thing to do", I think the answer of what to do with the code is obvious even if it maybe works.... – Brooks Moses May 26 '12 at 02:15