This answer is collected from my comments above, along with some additional insights regarding reading MATLAB files saved as HDF5. Frankly it's more complicated than a typical HDF5 file.
First, let's address your error reading the mask
attribute. The dataset data_dic_save/displacements/roi_ref_formatted
doesn't have an attribute named mask
. That's why you get the error. You can confirm with:
print (d.attrs.__contains__('mask'))
In fact, this dataset doesn't have any attributes. The following code will iterate over all attributes of object d
, then print the attribute names and values:
for name in d.attrs.__iter__():
print (name, h5f.attrs[name])
Now, let's talk about MATLAB files in HDF5 format. (Caveat: I am not an expert in MATLAB or Object References. Everything I know I learned by trail and error).
I opened your .mat file with HDFVIEW from The HDF Group. I consider this an essential tool when starting with HDF5, as it let's you "see" the file's schema and data in a GUI. (Frankly it's the only way I can figure out the contents of a .mat file. They are NOT self-describing like most HDF5 files.) There is a snapshot of your file at the end of this post for reference.
All of the datasets in the displacements
Group are 54X1 arrays of Object References. The same is true for all datasets in the strains
Group. I suspect the datasets in the dispinfo
and straininfo
Groups provide mapping of some kind. In addition, I found region
and size_mask
data in the #refs#/XX
datasets.
You should read the h5py documentation for details about Object References.
Ref: h5py Object Reference doc
Here's a brief explanation using your file.
The call below returns d
as a NumPy array of shape (54,1)
and dtype object
. The print statement confirms same.
d = f.get('data_dic_save/displacements/roi_ref_formatted')
print (d.dtype, d.shape)
object (54, 1)
Object References point to other objects in the file. To see how it works, print the first array entry in d
,and you will get a reference to dataset Lj
(which will be in Group /#refs#
). You can see the Lj
dataset is a NumPy array of shape (1,6)
and type u4
(unsigned int).
print (f[ d[0,0] ])
<HDF5 dataset "Lj": shape (1, 6), type "<u4">
That means the following 2 statements point to the same object and yield the same result:
print (f['/#refs#/Lj'][:])
[[3707764736 2 1 1 110 1]]
print (f[ d[0,0] ][:])
[[3707764736 2 1 1 110 1]]
At this point, I am stuck interpreting the rest of your data. Here are some additional answers that might help you.
This SO Q&A gives a basic explanation on reading .mat files:
read-matlab-v7-3-file-into-python-list-of-numpy-arrays-via-h5py
I wrote a more complete explanation (for reading SVHN datasets). You can access it here:
what-is-the-difference-between-the-two-ways-of-accessing-the-hdf5-group-in-svhn
