Read .mat file in Python

Question

I know how to call any data from the attached file (URL) by Python except any data connected with the name "ROI". For example, you can check "data_dic_save/displacements/roi_ref_formatted". I want to use data from this path in my work (like mask and region). However, I cannot open (read) them. Could you help me?

URL for .mat file: https://www.dropbox.com/s/127vo3uew0fppw5/res.mat?dl=0

Code:

with h5py.File('~/res.mat', 'r') as f:
    d = f.get('data_dic_save/displacements/roi_ref_formatted')
    print(d.attrs['mask'])

Error message:

KeyError: "Can't open attribute (can't locate attribute: 'mask')"

This is a valid question. Your dataset (`data_dic_save/displacements/roi_ref_formatted`) doesn't have an attribute named `mask`. That's why you get the error. You can confirm this with `print (d.attrs.__contains__('mask'))` In fact, this dataset doesn't have any attributes. — kcw78, Jul 14 '20 at 14:19
Dear kcw78, thank you for your reply. However, how can I read data from "roi_ref_formatted"? I checked this file in Matlab and got a lot of information from this path. But I need parse this file by Python... — Eugene Statnik, Jul 14 '20 at 15:32
Your dataset (`roi_ref_formatted`) is an array of object references with shape of (54,1). Use `print (d.dtype, d.shape)` to confirm. MATLAB saved as HDF5 is complicated. It uses object references that point to other objects in the file. To see how it works, `print (f[ d[0,0] ])` and you will get `HDF5 dataset "Lj"`, which is object reference that points to `f['/#refs#/Lj']` You can see this by comparing `print (f['/#refs#/Lj'][:])` to `print (f[ d[0,0] ][:])` — kcw78, Jul 14 '20 at 18:15
I checked your recommendations and got the following: object (54, 1) [[3707764736 2 1 1 110 1]] [[3707764736 2 1 1 110 1]] However, Matlab got another result (see attached link for screenshots). Why did I get these strange values??? https://www.dropbox.com/sh/1uli6oeq9yzrn1r/AAAUI0qnymY1qoNRfok9J_Yva?dl=0 — Eugene Statnik, Jul 15 '20 at 06:30
Warning: I am not an expert in MATLAB or Object References. Everything I know I learned by trail and error. I can't answer your "why" question. MATLAB uses a convoluted method to store data in HDF5 format (lots and lots of Object References). I opened your .mat file with **HDFVIEW**. All of the datasets in the `displacements` Group are 54X1 arrays of Object References. The same is true for all datasets in the `strains` Group. I suspect the datasets in the `dispinfo` and `straininfo` groups provide mapping. I found `region` and `size_mask` data in the `#refs#` datasets. — kcw78, Jul 15 '20 at 14:57

score 0 · Answer 1 · answered Jul 16 '20 at 01:31

This answer is collected from my comments above, along with some additional insights regarding reading MATLAB files saved as HDF5. Frankly it's more complicated than a typical HDF5 file.

First, let's address your error reading the mask attribute. The dataset data_dic_save/displacements/roi_ref_formatted doesn't have an attribute named mask. That's why you get the error. You can confirm with:

print (d.attrs.__contains__('mask'))

In fact, this dataset doesn't have any attributes. The following code will iterate over all attributes of object d, then print the attribute names and values:

for name in d.attrs.__iter__():
    print (name, h5f.attrs[name])

Now, let's talk about MATLAB files in HDF5 format. (Caveat: I am not an expert in MATLAB or Object References. Everything I know I learned by trail and error).

I opened your .mat file with HDFVIEW from The HDF Group. I consider this an essential tool when starting with HDF5, as it let's you "see" the file's schema and data in a GUI. (Frankly it's the only way I can figure out the contents of a .mat file. They are NOT self-describing like most HDF5 files.) There is a snapshot of your file at the end of this post for reference.

All of the datasets in the displacements Group are 54X1 arrays of Object References. The same is true for all datasets in the strains Group. I suspect the datasets in the dispinfo and straininfo Groups provide mapping of some kind. In addition, I found region and size_mask data in the #refs#/XX datasets.

You should read the h5py documentation for details about Object References.
Ref: h5py Object Reference doc

Here's a brief explanation using your file.

The call below returns d as a NumPy array of shape (54,1) and dtype object. The print statement confirms same.

d = f.get('data_dic_save/displacements/roi_ref_formatted')
print (d.dtype, d.shape)
object (54, 1)

Object References point to other objects in the file. To see how it works, print the first array entry in d,and you will get a reference to dataset Lj (which will be in Group /#refs#). You can see the Lj dataset is a NumPy array of shape (1,6) and type u4 (unsigned int).

print (f[ d[0,0] ]) 
<HDF5 dataset "Lj": shape (1, 6), type "<u4">

That means the following 2 statements point to the same object and yield the same result:

print (f['/#refs#/Lj'][:])
[[3707764736          2          1          1        110          1]]
print (f[  d[0,0] ][:])
[[3707764736          2          1          1        110          1]]

At this point, I am stuck interpreting the rest of your data. Here are some additional answers that might help you.

This SO Q&A gives a basic explanation on reading .mat files:
read-matlab-v7-3-file-into-python-list-of-numpy-arrays-via-h5py

I wrote a more complete explanation (for reading SVHN datasets). You can access it here:
what-is-the-difference-between-the-two-ways-of-accessing-the-hdf5-group-in-svhn

Read .mat file in Python

1 Answers1