Have hdf5 designers resolved the corruption issue related to opening .hdf5 files?

Question

I am running a code which takes .hdf5 files as an input (produced by a simulation) and then analyze them and produces some statistics and plot through running the command line: python3 Collector.py in a Konsole shell of a Fedora 21 Linux. I have lots of .py routines in two different folders named gizmo and utilities in the working directory. The snapshot_index.hdf5 files are transferred separately (using globus software) from another machine that runs the simulations into a local directory called output inside the working directory in my laptop. (There are many files corresponding to "index" running from 0 to 600 but I only need two such indices, eg. snapshot_396.hdf5 and snapshot_600.hdf5). Simulations are run in two different modes: low-resolution and high-resolution.

When kilo-bite size low resolution .hdf5 files are the input of the part expression (inside the main python code mentioned above), I am able to run the code and produce the results but when I am putting mega-bite size high resolution .hdf5 files as input of the part expression, I am receiving the following error message:

# in utilities.simulation.Snapshot():
  read snapshot_times.txt
  reading snapshot index = 600, redshift = 0.000


# in gizmo.gizmo_io.Read():
  reading header from: ./output/snapshot_600.hdf5

Traceback (most recent call last):
  File "Collector.py", line 12, in <module>
    part=gizmo.io.Read.read_snapshots('all', 'index', 600, element_indices=None)
  File "/home/username/Desktop/Projects/PaperMaterials/DM_Dominated_Objects/NewFolder2/covering_fractions/Simulations/gizmo/gizmo_io.py", line 314, in read_snapshots
    'index', snapshot_index, simulation_directory, snapshot_directory, simulation_name)
  File "/home/username/Desktop/Projects/PaperMaterials/DM_Dominated_Objects/NewFolder2/covering_fractions/Simulations/gizmo/gizmo_io.py", line 513, in read_header
    file_in = h5py.File(file_name, 'r')  # open hdf5 snapshot file
  File "/usr/lib/python3.4/site-packages/h5py/_hl/files.py", line 222, in __init__
    fid = make_fid(name, mode, userblock_size, fapl)
  File "/usr/lib/python3.4/site-packages/h5py/_hl/files.py", line 79, in make_fid
    fid = h5f.open(name, h5f.ACC_RDONLY, fapl=fapl)
  File "h5f.pyx", line 71, in h5py.h5f.open (h5py/h5f.c:1809)
OSError: Unable to open file (Truncated file: eof = 933756928, sblock->base_addr = 0, stored_eoa = 1765865624)

I am not understanding the meaning of the error. But when I did a search on the topic, I noticed that this is not a new issue regarding .hdf5 files (Corrupt files when creating HDF5 files without closing them (h5py)) except that the reason of not being able to open the file is different in my case. From what I understand (not sure if correctly), the files are too big and hence truncated. If this is the case, what is the solution for that? And if I am wrong then what is the issue? Your help is greatly appreciated.

score 1 · Accepted Answer · answered Oct 22 '16 at 05:46

1

It appears your input file is corrupt. What happens if you run the command-line utilities h5dump, h5stat, or h5ls on your input file?

I don't think your problem has anything to do with the designers of HDF5, or the other question you linked (which was about a program which crashed while writing an HDF5 file). Most likely your writing program has a bug. You can verify this by seeing if other valid HDF5 programs work with your files.

Your file appears to be less than 1 GB in size, which is not that huge for HDF5.

answered Oct 22 '16 at 05:46

John Zwinck

239,568
38
324
436

All three codes applied on the files give "error: unable to open file "snapshot_index.hdf5"" So, you are right; the files are corrupt. I just transferred them from a national laboratory super computer. That's why I doubted if they are crashed. – Rebel Oct 22 '16 at 19:53

Have hdf5 designers resolved the corruption issue related to opening .hdf5 files?

1 Answers1

Linked