7

I run the same Python program concurrently as different processes, and these all want to write to the same hdf5 file, using the h5py Python package. However, only a single process may open a given hdf5 file in write mode, otherwise you will get the error

OSError: Unable to open file (unable to lock file, errno = 11, error message = 'Resource temporarily unavailable')

During handling of the above exception, another exception occurred:

OSError: Unable to create file (unable to open file: name = 'test.hdf5', errno = 17, error message = 'File exists', flags = 15, o_flags = c2)

I want to resolve this by checking whether the file is already opened in write mode, and if so, wait a bit and check again, until it is no longer opened in write mode. I have not found any such checking capability of h5py or hdf5. As of now, my solution is based on this:

from time import sleep
import h5py

# Function handling the intelligent hdf5 file opening
def open_hdf5(filename, *args, **kwargs):
    while True:
        try:
            hdf5_file = h5py.File(filename, *args, **kwargs)
            break  # Success!
        except OSError:
            sleep(5)  # Wait a bit
    return hdf5_file

# How to use the function
with open_hdf5(filename, mode='a') as hdf5_file:
    # Do stuff
    ...

I'm unsure whether I like this, as it doesn't seem very gentle. Are there any better way of doing this? Are there any change that my erroneous attempts to open the file inside the try can somehow corrupt the write process that is going on in the other process?

jmd_dk
  • 12,125
  • 9
  • 63
  • 94
  • I guess you have checked the possibilities mentioned in the manual including the SWMR Feature. http://docs.h5py.org/en/latest/mpi.html#using-parallel-hdf5-from-h5py If you can't use those features or didn't want to use them, why not use a single process that reads/writes to the HDF5-File? Usually the single thread I/O isn't a real botlenck. Correct usage of chunk-chache/ minimazation of API- calls is for example much more important. – max9111 Mar 28 '18 at 11:46
  • My problem is not one of performance. I simply have multiple processes (which in principle has nothing to do with each other) trying to write to the same file at once. It is not a problem to let the processes wait until the file is not opened in write mode by any other process before it attempts to open the file itself. – jmd_dk Mar 28 '18 at 15:16

1 Answers1

2

Judging by a quick research there is no platform independent way of checking if a file is already is open write mode. How to check whether a file is_open and the open_status in python https://bytes.com/topic/python/answers/612924-how-check-whether-file-open-not

However since you have defined a wrapper open read/write methods for reading writing your hdf5 file you can always create a "file_name".lock file when you have one process that succeeded in opening the hdf5 file.

Then all you have to do is use os.path.exists('"file_name".lock') to know if you can open the file in write mode.

Essentially it is not very different for what you do. However first it's just you can look in your filesytem to see whether one of your process accesses in write mode the file, second the test is not the product of an exception since os.path.exists will return a boolean.

Many applications use this kind of trick. When roaming through CVS repo you often see .lock files lying around...

PilouPili
  • 2,601
  • 2
  • 17
  • 31
  • The only problem is that pytables doesn't seem to clean up after itself very well, leaving lots of files open in unpredictable ways. – derchambers Mar 14 '19 at 04:32