558

Is it possible to read binary MATLAB .mat files in Python?

I've seen that SciPy has alleged support for reading .mat files, but I'm unsuccessful with it. I installed SciPy version 0.7.0, and I can't find the loadmat() method.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Gilad Naor
  • 20,752
  • 14
  • 46
  • 53

16 Answers16

768

An import is required, import scipy.io...

import scipy.io
mat = scipy.io.loadmat('file.mat')
user8408080
  • 2,428
  • 1
  • 10
  • 19
Gilad Naor
  • 20,752
  • 14
  • 46
  • 53
  • 32
    scipy does not support v7.3 mat-files (see notes [here](http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.io.loadmat.html)). See the [answer by vikrantt](http://stackoverflow.com/a/19340117/674976) for solution. – texnic May 30 '14 at 15:35
  • 1
    however, you can save mat-files as earlier versions. see: http://www.mathworks.com/help/matlab/import_export/mat-file-versions.html (header: 'Save to Nondefault MAT-File Version') – watsonic Apr 22 '15 at 22:24
  • 7
    e.g. `save('myfile.mat','-v7')` – watsonic Apr 22 '15 at 22:32
  • 1
    Updated link to the SciPy.io tutorial https://docs.scipy.org/doc/scipy/tutorial/io.html @FranckDernoncourt – ZaydH Apr 27 '22 at 10:11
214

Neither scipy.io.savemat, nor scipy.io.loadmat work for MATLAB arrays version 7.3. But the good part is that MATLAB version 7.3 files are hdf5 datasets. So they can be read using a number of tools, including NumPy.

For Python, you will need the h5py extension, which requires HDF5 on your system.

import numpy as np
import h5py
f = h5py.File('somefile.mat','r')
data = f.get('data/variable1')
data = np.array(data) # For converting to a NumPy array
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
vikrantt
  • 2,537
  • 1
  • 13
  • 7
  • 8
    This works fine, if you use the '-v7.3' flag in Matlab when saving out your data. Using the default `save` (at least in Matlab R2014b) results in a file that cannot be read using the technique above. If you do use the '-v7.3' flag, the numeric data can be read just fine. – chipaudette May 06 '15 at 17:58
  • 5
    Yes, that's what I said in my post. You need to use -v7.3 while saving in Matlab. You should do that anyways as it uses a better/more supported/standardized format. – vikrantt May 10 '15 at 22:18
  • 5
    Could you please explain what is the relation between _f_ and _data_ in your example? How can I move _f_ to a numpy array? – heracho Jun 06 '17 at 19:02
  • Save a variable with this command from the prompt: `save('filename', '-v7.3', 'var1');` – Kevin Katzke Jul 02 '17 at 19:24
  • 4
    How would i even know that it contains data under data/variable1 ?? – devspartan Jul 05 '20 at 03:09
  • 8
    @devSpartan `f.keys()` will show you what you can access – Packard CPW Oct 22 '20 at 16:17
  • "Unable to open file (file signature not found)" – Ramin Melikov Nov 13 '20 at 19:03
  • 1
    @heracho f is the file object that lets you read data from the hdf5 file. It's similar to using `f = open('myfile.txt')` to read a text file. – ThatNewGuy Apr 03 '21 at 16:31
  • @vikrantt This is a great solution for reading a `struct` but it does **not** work for a Matlab `table`. To read tables, you can't use the `save` function. Instead, you need to utilize the `h5create` and `h5write` functions. – ThatNewGuy Apr 03 '21 at 16:32
33

First save the .mat file as:

save('test.mat', '-v7')

After that, in Python, use the usual loadmat function:

import scipy.io as sio
test = sio.loadmat('test.mat')
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Bhanu Pratap Singh
  • 1,017
  • 1
  • 12
  • 15
26

There is a nice package called mat4py which can easily be installed using

pip install mat4py

It is straightforward to use (from the website):

Load data from a MAT-file

The function loadmat loads all variables stored in the MAT-file into a simple Python data structure, using only Python’s dict and list objects. Numeric and cell arrays are converted to row-ordered nested lists. Arrays are squeezed to eliminate arrays with only one element. The resulting data structure is composed of simple types that are compatible with the JSON format.

Example: Load a MAT-file into a Python data structure:

from mat4py import loadmat

data = loadmat('datafile.mat')

The variable data is a dict with the variables and values contained in the MAT-file.

Save a Python data structure to a MAT-file

Python data can be saved to a MAT-file, with the function savemat. Data has to be structured in the same way as for loadmat, i.e. it should be composed of simple data types, like dict, list, str, int, and float.

Example: Save a Python data structure to a MAT-file:

from mat4py import savemat

savemat('datafile.mat', data)

The parameter data shall be a dict with the variables.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Cleb
  • 25,102
  • 20
  • 116
  • 151
  • Note that mat4py gives you a json-like tree of dicts, lists, lists of lists ... -- no numpy at all. (`mat4py/cmd.py my.mat` writes `my.json`, 1 long line.) – denis Nov 14 '18 at 14:20
  • 1
    @denis: Yes, that's also stated above. But a good point indeed: I usually like this structure, e.g. in web applications as [numpy arrays are not JSON serializable](https://stackoverflow.com/questions/26646362/numpy-array-is-not-json-serializable). – Cleb Nov 14 '18 at 15:34
  • 1
    Encountered: `mat4py.loadmat.ParseError: Can only read from Matlab level 5 MAT-files` – s2t2 Jul 19 '19 at 14:19
  • @s2t2: never ran into this issue before. What matlab version and which scipy version are you using? – Cleb Jul 19 '19 at 15:09
  • ParseError: Unexpected field name length: 43 – Aleksejs Fomins Feb 21 '20 at 09:29
  • @AleksejsFomins: Probably best to open a new question and link to this thread; then it will be easier to help (make sure to provide all info needed to reproduce the error). – Cleb Feb 21 '20 at 10:05
  • @s2t2 mat4py cannot read the new matlab format from version 7.3, which is encoded as hdf5. – Chachni Jul 17 '20 at 15:48
  • Also note that mat4py does not allow to read complex valued .mat files – SjonTeflon Sep 15 '21 at 11:59
  • I faced this issue: "mat4py.loadmat.ParseError: Can only read from Matlab level 5 MAT-files" – Abrar_11648 Mar 06 '22 at 03:41
  • 1
    good solution for fallback if scipy is not available (for example on Rasperry Pi). its also faster with pure number data but slower if there are strings – save_jeff Mar 05 '23 at 08:09
17

Having MATLAB 2014b or newer installed, the MATLAB engine for Python could be used:

import matlab.engine
eng = matlab.engine.start_matlab()
content = eng.load("example.mat", nargout=1)
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Daniel
  • 36,610
  • 3
  • 36
  • 69
  • I got this error: ModuleNotFoundError: No module named 'pylab'. – NeoZoom.lua Jun 13 '18 at 01:51
  • 3
    You got the error when trying this answers? That is odd, it does not use pylab. – Daniel Jun 13 '18 at 05:40
  • 1
    For the record, this answer requires a valid Matlab installation and license - it runs Matlab in the background to accomplish the read. And there may be limitations on what format you get the items in that need further work to make them readable. For example, Simulink.Bus objects come in as a "matlab object" and must be processed further, with issues if you want to extract the Bus Element objects. – LightCC Jun 23 '22 at 20:13
15

Reading the file

import scipy.io
mat = scipy.io.loadmat(file_name)

Inspecting the type of MAT variable

print(type(mat))
#OUTPUT - <class 'dict'>

The keys inside the dictionary are MATLAB variables, and the values are the objects assigned to those variables.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Daksh
  • 1,064
  • 11
  • 22
14

There is a great library for this task called: pymatreader.

Just do as follows:

  1. Install the package: pip install pymatreader

  2. Import the relevant function of this package: from pymatreader import read_mat

  3. Use the function to read the matlab struct: data = read_mat('matlab_struct.mat')

  4. use data.keys() to locate where the data is actually stored.

  • The keys will usually look like: dict_keys(['__header__', '__version__', '__globals__', 'data_opp']). Where data_opp will be the actual key which stores the data. The name of this key can ofcourse be changed between different files.
  1. Last step - Create your dataframe: my_df = pd.DataFrame(data['data_opp'])

That's it :)

Ofir Shorer
  • 446
  • 1
  • 6
  • 16
10

There is also the MATLAB Engine for Python by MathWorks itself. If you have MATLAB, this might be worth considering (I haven't tried it myself but it has a lot more functionality than just reading MATLAB files). However, I don't know if it is allowed to distribute it to other users (it is probably not a problem if those persons have MATLAB. Otherwise, maybe NumPy is the right way to go?).

Also, if you want to do all the basics yourself, MathWorks provides (if the link changes, try to google for matfile_format.pdf or its title MAT-FILE Format) a detailed documentation on the structure of the file format. It's not as complicated as I personally thought, but obviously, this is not the easiest way to go. It also depends on how many features of the .mat-files you want to support.

I've written a "small" (about 700 lines) Python script which can read some basic .mat-files. I'm neither a Python expert nor a beginner and it took me about two days to write it (using the MathWorks documentation linked above). I've learned a lot of new stuff and it was quite fun (most of the time). As I've written the Python script at work, I'm afraid I cannot publish it... But I can give some advice here:

  • First read the documentation.
  • Use a hex editor (such as HxD) and look into a reference .mat-file you want to parse.
  • Try to figure out the meaning of each byte by saving the bytes to a .txt file and annotate each line.
  • Use classes to save each data element (such as miCOMPRESSED, miMATRIX, mxDOUBLE, or miINT32)
  • The .mat-files' structure is optimal for saving the data elements in a tree data structure; each node has one class and subnodes
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
mozzbozz
  • 3,052
  • 5
  • 31
  • 44
10

To read mat file to pandas dataFrame with mixed data types

import scipy.io as sio
mat=sio.loadmat('file.mat')# load mat-file
mdata = mat['myVar']  # variable in mat file 
ndata = {n: mdata[n][0,0] for n in mdata.dtype.names}
Columns = [n for n, v in ndata.items() if v.size == 1]
d=dict((c, ndata[c][0]) for c in Columns)
df=pd.DataFrame.from_dict(d)
display(df)
ASE
  • 1,702
  • 2
  • 21
  • 29
5

Apart from scipy.io.loadmat for v4 (Level 1.0), v6, v7 to 7.2 matfiles and h5py.File for 7.3 format matfiles, there is anther type of matfiles in text data format instead of binary, usually created by Octave, which can't even be read in MATLAB.

Both of scipy.io.loadmat and h5py.File can't load them (tested on scipy 1.5.3 and h5py 3.1.0), and the only solution I found is numpy.loadtxt.

import numpy as np
mat = np.loadtxt('xxx.mat')
MrCrHaM
  • 63
  • 2
  • 8
2

Can also use the hdf5storage library. official documentation here for details on matlab version support.

import hdf5storage

label_file = "./LabelTrain.mat"
out = hdf5storage.loadmat(label_file) 

print(type(out)) # <class 'dict'>
Nannigalaxy
  • 492
  • 6
  • 17
2
  1. Install scipy

    pip install scipy

  2. Import the scipy.io.loadmat module

     from scipy.io import loadmat
      annots = loadmat('annotation_0001.mat')
      print(annots)
  1. Parse the .mat file structure
   con_list = [[element for element in upperElement] for upperElement in annots['obj_contour']]
  1. Use Pandas dataframes to work with the data
import pandas as pd
   newData = list(zip(con_list[0], con_list[1]))
   columns = ['obj_contour_x', 'obj_contour_y']
   df = pd.DataFrame(newData, columns=columns)

refrence: https://www.askpython.com/python/examples/mat-files-in-python

1
from os.path import dirname, join as pjoin
import scipy.io as sio
data_dir = pjoin(dirname(sio.__file__), 'matlab', 'tests', 'data')
mat_fname = pjoin(data_dir, 'testdouble_7.4_GLNX86.mat')
mat_contents = sio.loadmat(mat_fname)

You can use above code to read the default saved .mat file in Python.

1

After struggling with this problem myself and trying other libraries (I have to say mat4py is a good one as well but with a few limitations) I have built this library ("matdata2py") that can handle most variable types and most importantly for me the "string" type. The .mat file needs to be saved in the -V7.3 version. I hope this can be useful for the community.

Installation:

pip install matdata2py

How to use this lib:

import matdata2py as mtp

To load the Matlab data file:

Variables_output = mtp.loadmatfile(file_Name, StructsExportLikeMatlab = True, ExportVar2PyEnv = False)
print(Variables_output.keys()) # with ExportVar2PyEnv = False the variables are as elements of the Variables_output dictionary. 

with ExportVar2PyEnv = True you can see each variable separately as python variables with the same name as saved in the Mat file.

Flag descriptions

StructsExportLikeMatlab = True/False structures are exported in dictionary format (False) or dot-based format similar to Matlab (True)

ExportVar2PyEnv = True/False export all variables in a single dictionary (True) or as separate individual variables into the python environment (False)

0

scipy will work perfectly to load the .mat files. And we can use the get() function to convert it to a numpy array.

mat = scipy.io.loadmat('point05m_matrix.mat')

x = mat.get("matrix")
print(type(x))
print(len(x))

plt.imshow(x, extent=[0,60,0,55], aspect='auto')
plt.show()
Rahul Gulia
  • 57
  • 1
  • 3
0

To Upload and Read mat files in python

  1. Install mat4py in python.On successful installation we get:
  2. Successfully installed mat4py-0.5.0.
  3. Importing loadmat from mat4py.
  4. Save file actual location inside a variable.
  5. Load mat file format to a data value using python
    pip install mat4py
    from mat4py import loadmat
    boston = r"E:\Downloads\boston.mat" data = loadmat(boston, meta=False)
dataninsight
  • 1,069
  • 6
  • 13