I work a lot with binary flat files and they need to remain in their current format to work with legacy codes, however I would also like to be able to use some of the features of HDF5 files with the attributes and groups. I see in the HDF5 documentation 5.5.4 that external data can be linked in. Is there a straight-forward way to create the HDF5 files and add the external links with h5py?
Asked
Active
Viewed 1,010 times
6
-
Are you referring to `5.5.4. External Storage Properties`? Looks like the key to using that is the `H5Pset_external` command. If `h5py` does not give you access to that command, you many have to create the file with other `hdf5` utilities. – hpaulj Aug 24 '15 at 19:27
-
Thanks hpaulj. For some reason i can't look at that paged due to "banned content", but if that is the case I guess I will have to write my own tool using the c++ utilities then. – Craig Aug 24 '15 at 20:46
-
Try your own search on hfpy google groups, listed at http://docs.h5py.org/en/latest/contributing.html. I'll delete the earlier comment. I found a few posts dealing with 'external storage', but not many. – hpaulj Aug 25 '15 at 03:16
-
H5Pset_external is not currently exposed. https://github.com/h5py/h5py/issues/945 – Greg Allen Nov 06 '17 at 21:54
1 Answers
0
Assuming your binary flat file contains 10000 float32 at a certain OFFSET in bytes in the file, and that you want to read them back into a 3D array data of shape (10, 20, 50) the code below should do the job
input_file = "filename.raw"
output_file = "filename.h5"
offset = OFFSET
shape = (10, 20, 50)
size = 4 * shape[0] * shape[1] * shape[2] # 4 bytes per element
import h5py
import numpy
h5 = h5py.File(output_file, "w")
dataset = h5["/"].create_dataset("data",
shape=shape,
dtype=numpy.float32,
external=((input_file, offset, size),))
h5.flush()
h5.close()
The external
keyword expects multiple tuples of the form (filename, offset, size) for the case the created dataset should be built with portions of different files or offsets.

vasole
- 1