I am importing a *.mat file into Python via a script that I found on Stackoverflow.
import h5py
def read_matlab(filename):
"""
Import *.mat-file.
Source: https://stackoverflow.com/a/58026181/5696601
"""
print(f"Importing '{filename}' ...")
def conv(path=''):
p = path or '/'
paths[p] = ret = {}
for k, v in f[p].items():
if type(v).__name__ == 'Group':
ret[k] = conv(f'{path}/{k}') # Nested struct
continue
v = v[()] # It's a Numpy array now
if v.dtype == 'object':
# HDF5ObjectReferences are converted
# into a list of actual pointers
ret[k] = (
[r and paths.get(f[r].name, f[r].name) for r in v.flat]
)
else:
# Matrices and other numeric arrays
ret[k] = v if v.ndim < 2 else v.swapaxes(-1, -2)
return ret
paths = {}
with h5py.File(filename, 'r') as f:
return conv()
file = read_matlab("test.mat")
I know that the matrix contained in test.mat
has the dimension (1134,30807). However, file
is a dictionary containing another dictionary with three keys:
file["Y_RMRIO"].keys()
Out[5]: dict_keys(['data', 'ir', 'jc'])
The dictionaries' shapes are as follows:
file["Y_RMRIO"]["data"].shape
Out[11]: (22037784,)
file["Y_RMRIO"]["ir"].shape
Out[12]: (22037784,)
file["Y_RMRIO"]["jc"].shape
Out[13]: (1135,)
How can I import the *.mat file and maintain the matrix's shape of (1134,30807) or turn the imported data into the shape again (e.g. np.array or pd.DataFrame)?
If I get it right, at least one of the dictionaries contains information on the "position" of the data points in the matrix. So I guess the data points could be inserted into an array at the right positions with zeros in-between (or into a np.zeros array with the right dimension). The array could then be reshaped into the desired shape ... ?
Any help is welcome. Many thanks in advance!