2

Do I understand correctly that HDF5-files should be manually closed like this:

import h5py

file = h5py.File('test.h5', 'r')

...

file.close()

From the documentation: "HDF5 files work generally like standard Python file objects. They support standard modes like r/w/a, and should be closed when they are no longer in use.".

But I wonder: will the garbage collection evoke file.close() when the script terminates or when file is overwritten?

Tom de Geus
  • 5,625
  • 2
  • 33
  • 77
  • 2
    If you open the file as shown, yes, you should use `file.close()` to close. Otherwise file integrity is unreliable (might be OK, might not). Don't leave it to chance. Alternately, you can use `with h5py.File('test.h5', 'r') as file:` and `h5py` will take care of closing it appropriately when you exit. – kcw78 May 15 '19 at 15:32
  • @kcw78 this is the correct answer, why do you only post this as a comment? :-) – Asmus May 16 '19 at 05:49
  • @kcw78 Thanks! Maybe you can clarify my thoughts: I thought in the C++ way, with `h5py.File` being a class of which a destructor would be called when the number of references becomes zero (which would happen when the script finishes of the variable is overwritten). But you seem to be saying that the behaviour is different. – Tom de Geus May 16 '19 at 07:22
  • 1
    @TomdeGeus I guess [this answer](https://stackoverflow.com/a/7395906/565489) does apply here, too; having a Garbage Collection is not always *guaranteed*, manually closing a file (or using the `with … as f` statement) is the better, more secure way to code. – Asmus May 16 '19 at 12:49
  • 1
    @TomdeGeus, I am not familiar with the underlying code (destrutors, etc). My comments are based on my experience. When a file is opened in 'w' mode, data corruption is common when it is not properly closed with one of the methods above. There are multiple SO questions on that topic. Since you are opening in 'r' mode, it may not be an issue for you. – kcw78 May 16 '19 at 13:19

1 Answers1

2

This was answered in the comments a long time ago by @kcw78, but I thought I might as well write it up as a quick answer for anyone else reaching this.

As @kcw78 says, you should explicitly close files when you are done with them by calling file.close(). From previous experience, I can tell you that h5py files are usually closed properly anyway when the script terminates, but occasionally the files would be corrupt (although I'm not sure if that ever happens when in 'r' mode only). Better not to leave it to chance!

As @kcw78 also suggests, using a context manager is a good way to go if you want to be safe. In either case, you need to be careful to actually extract the data you want before letting the file close.

e.g.

import h5py

with h5py.File('test.h5', 'w') as f:
    f['data'] = [1,2,3]

# Letting the file close and reopening in read only mode for example purposes

with h5py.File('test.h5', 'r') as f:
    dataset = f.get('data')  # get the h5py.Dataset
    data = dataset[:]  # Copy the array into memory 
    print(dataset.shape, data.shape)  # appear to behave the same
    print(dataset[0], data[0])  # appear to behave the same

print(data[0], data.shape)  # Works same as above
print(dataset[0], dataset.shape)  # Raises ValueError: Not a dataset

dataset[0] raises an error here because dataset was an instance of h5py.Dataset which was associated with f and was closed at the same time f was closed. Whereas data is just a numpy array containing only the data part of the dataset (i.e. no additional attributes).

Homer512
  • 9,144
  • 2
  • 8
  • 25
Tim Child
  • 339
  • 1
  • 11