I have an HDF5 with 100 "events". Each event contains variable, but roughly 180 groups called "traces", and each trace has inside 6 datasets which are arrays of 32 bit floats, each ~1000 cells long (this carries slightly from event to event, but remains constant inside an event). The file was generated with default h5py settings (so no chunking or compression unless h5py does it on its own).
The readout is not fast. It is ~6 times slower than readout of the same data from CERN ROOT TTrees. I know that HDF5 is far from the fastest formats on the market, But I would be grateful, if you could tell me, where the speed is lost.
To read the arrays in traces I do:
d0keys = data["Run_0"].keys()
for key_1 in d0keys:
if("Event_" in key_1):
d1 = data["Run_0"][key_1]
d1keys = d1.keys()
for key_2 in d1keys:
if("Traces_" in key_2):
d2 = d1[key_2]
v1, v2, v3, v4, v5, v6 = d2['SimSignal_X'][0],d2['SimSignal_Y'][0],d2['SimSignal_Z'][0],d2['SimEfield_X'][0], d2['SimEfield_Y'][0],d2['SimEfield_Z'][0]
Line profiler shows, that ~97% of the time is spent in the last line. Now, there are two issues:
- It seems there is no difference between reading cell [0] and all the ~1000 cells with [:]. I understand that h5py should be able to read just a chunk of data from the disk. why no difference?
- Reading 100 events from HDD (Linux, ext4) takes ~30 s with h5py, and ~5 s with ROOT. The size of 100 events is roughly 430 MB. This gives readout speed in HDF of ~14 MBps, while ROOT is ~86 MBps. Both slow, but ROOT comes much closer to the raw readout speed that I would expect from ~4 yo laptop HDD.
So where does h5py loses its speed? I guess the pure readout should be just the HDD speed. Thus, is the bottleneck:
- Dereferencing HDF5 address to the dataset (ROOT does not need to do it)?
- Allocating memory in python?
- Something else?
I would be grateful for some clues.