5

I tried to import and read .mat file from Python. I have tried two ways but been unsuccessful.

Method 1 (In Python):

import scipy.io as sio    
mat = sio.loadmat('path/tmpPBworkspace.mat')

I get a message similar to:

{'None': MatlabOpaque([ (b'rateQualityOutTrim', b'MCOS', b'dataset', array([[3707764736],
        [         2],
        [         1],
        [         1],
        [         1],
        [         1]], dtype=uint32))],
              dtype=[('s0', 'O'), ('s1', 'O'), ('s2', 'O'), ('arr', 'O')]),
 '__function_workspace__': array([[ 0,  1, 73, ...,  0,  0,  0]], dtype=uint8),
 '__globals__': [],
 '__header__': b'MATLAB 5.0 MAT-file, Platform: GLNXA64, Created on: Thu May 10 07:11:52 2018',
 '__version__': '1.0'}

I am not sure what went wrong there? I was hoping to see a data frame. Also to add, in Method 1, I have saved the .mat in a version compatible with SciPy.

In Matlab:

save('path/tmpPBworkspace.mat','rateQualityOutTrim','-v7')

Also tried the other way:

Method 2: h5py

In Matlab:

save('path/tmpPBworkspaceH5.mat','rateQualityOutTrim','-v7.3')

In Python:

import numpy as np
import h5py
f = h5py.File('/GAAR/ustr/projects/PBF/tmpPBworkspaceH5.mat','r')
data = f.get('rateQualityOutTrim/date')
data = np.array(data)

I get

f
Out[154]: <HDF5 file "tmpPBworkspaceH5.mat" (mode r)>

data
array(None, dtype=object)

The array is empty. Not sure how I can access the data here as well.

Muhammad Mohsin Khan
  • 1,444
  • 7
  • 16
  • 23
SBad
  • 1,245
  • 5
  • 23
  • 36
  • 1
    The Opaque item is a matlab class object that it can't turn into a numpy array. – hpaulj May 11 '18 at 09:43
  • thanks hpaulj any idea on how I can read .mat ? – SBad May 11 '18 at 10:07
  • what is the matlab object? – hpaulj May 11 '18 at 11:56
  • what do you mean? – SBad May 11 '18 at 12:52
  • I don't about datasets in MATLAB or whether they are compatible with `pandas`. But to load variables with `loadmat`, you have to write matrices, cells, or structs. – hpaulj May 11 '18 at 15:48
  • When I've looked at H5 files, I'v had systematically search the datagroups and datasets. `h5dump` may give a quick overview. – hpaulj May 11 '18 at 15:51
  • A previous question about a `MatlabOpaque`: https://stackoverflow.com/questions/32913301/matlab-date-string-results-in-java-lang-string-in-python-scipy-io; and https://stackoverflow.com/questions/15512560/access-mat-file-containing-matlab-classes-in-python – hpaulj May 11 '18 at 21:33
  • If you are really in control of Matlab part, don't use any Matlab class objects, rather go with **MATLAB arrays, cells, and struct** (as these can be turned into a numpy array). I'm not much familiar with Matlab, but it might be possible to use `struct(your_class_object)` to convert it. – Nerxis Jun 26 '19 at 09:17
  • Another option is to use kind of reverse engineering and parse `__function_workspace__` data (very long array that you can see in your dict after using `scipy.io.loadmat`) - check [this link](https://nbviewer.jupyter.org/gist/mbauman/9121961). But as you can see, it's not a nice way of working with .mat files. – Nerxis Jun 26 '19 at 09:17

2 Answers2

6

You can use scipy.io.loadmat for this:

from scipy import io

loaded = io.loadmat('/GAAR/ustr/projects/PBF/tmpPBworkspaceH5.mat')

loaded will be a dictionary mapping names to arrays.


If you're in control of both the Matlab part and the Pandas part, however, it is much easier to use csvwrite:

In Matlab:

csvwrite('path/tmpPBworkspaceH5.csv','rateQualityOutTrim')

In Python:

pd.read_csv('tmpPBworkspaceH5.csv')
Ami Tavory
  • 74,578
  • 11
  • 141
  • 185
  • Thanks Ami Tavory. I did that and I get similar message as before `{'None': MatlabOpaque([ (b'rateQualityOutTrim', b'MCOS', b'dataset', array([[3707764736], [ 2], [ 1], [ 1], [ 1], [ 1]], dtype=uint32))], dtype=[('s0', 'O'), ('s1', 'O'), ('s2', 'O'), ('arr', 'O')]), '__function_workspace__': array([[ 0, 1, 73, ..., 0, 0, 0]], dtype=uint8), '__globals__': [], '__header__': b'MATLAB 5.0 MAT-file, Platform: GLNXA64, Created on: Fri May 11 03:33:35 2018', '__version__': '1.0'}` – SBad May 11 '18 at 12:51
  • still not sure how i can extract the data? – SBad May 11 '18 at 12:51
  • @SBad Got it - this is explained very nicely in [this notebook](https://gist.github.com/mbauman/9121961) - it's in Julia, but you can follow the explanations. – Ami Tavory May 11 '18 at 12:54
  • @SBad Incidentally, looking at your question, it looks like you're in control of the Matlab part as well. In this case, there are much easier options. I edited my answer to include one. – Ami Tavory May 11 '18 at 13:00
  • @SBad It's a long answer, since this format is really not meant to be used for exporting - it's reverse engineered, and you probably don't want to write in it to the first place. – Ami Tavory May 11 '18 at 13:02
  • I have a very large dataset (more than a million rows) and exporting to csv is not optimal and may take a very long time. saving the data as .mat and importing in python may be the best solution i think. – SBad May 11 '18 at 13:08
0

I also would try it with scipy.io.

I have a Matlab "struct" (Auslage_000.mat) that I understand as some sort of nested dictionary. It has several header information and three data channels (vibration data). I also find Spyder (Python Development Environment) helpful as once the data is loaded you can access the data via a variable manager (similar to Matlab).

import scipy.io as sio
    
mat_contents = sio.loadmat('Auslage_000.mat',squeeze_me=True,struct_as_record=False)

When I check the output of my variable "mat_contends" I get

mat_contents

Out[14]: 
{'__header__': b'MATLAB 5.0 MAT-file, Platform: PCWIN, Created on 2019-08-14 13:14:56 by TiePie software (www.tiepie.com).',
 '__version__': '1.0',
 '__globals__': [],
 'tpd': <scipy.io.matlab.mio5_params.mat_struct at 0x1ea3441d438>}

My actual data is in tpd. I can further access the data as follows:

#Access the data via the key 'tpd' and then the attribute 'Data'
# -> Data is a numpy array with 3 channels (ch1, ch2, ch3) / dimensions
Data = mat_contents['tpd'].Data
    
# extract channel1 
    
ch1 = Data[0]

I guess you have to dig a little bit as first you have "keys" and the "attributes" in your Matlab file (if it is a struct).

Muhammad Mohsin Khan
  • 1,444
  • 7
  • 16
  • 23
joko
  • 9
  • 1