Read .mat file in Python. But the shape of the data changed

Question

 % save .mat file in the matlab    
train_set_x=1:50*1*51*61*23;   
train_set_x=reshape(train_set_x,[50,1,51,61,23]);   
save(['pythonTest.mat'],'train_set_x','-v7.3');

The data obtained in the matlab is in the size of (50,1,51,61,23).

I load the .mat file in Python with the instruction of this link.

The code is as follows:

import numpy as np, h5py
f = h5py.File('pythonTest.mat', 'r')
train_set_x = f.get('train_set_x')
train_set_x = np.array(train_set_x)

The output of train_set_x.shape is (23L, 61L, 51L, 1L, 50L). It is expected to be (50L, 1L, 51L, 61L, 23L). So I changed the shape by

train_set_x=np.transpose(train_set_x, (4,3,2,1,0))

I am curious about the change in data shape between Python and matlab. Is there some errors in my code?

For earlier `.mat` versions, `scipy.io.loadmat` produces arrays with the same shape as MATLAB, but `order='F'`. Thus it sort of hides this difference. — hpaulj, Sep 01 '16 at 07:02
@hpaulj: What do you mean by early? What's the behaviour change for "late" mat versions? — Eric, Sep 01 '16 at 08:22
MATLAB `save` takes a version option. `V7` and earlier use a native MATLAB file format, not `hdf5`. `loadmat` handles those. I can post a Octave/numpy example if needed. — hpaulj, Sep 01 '16 at 16:01

score 4 · Accepted Answer · answered Sep 01 '16 at 06:47

4

You do not have any errors in the code. There is a fundamental difference between Matlab and python in the way they treat multi-dimensional arrays.
Both Matalb and python store all the elements of the multi-dim array as a single contiguous block in memory. The difference is the order of the elements:
Matlab, (like fortran) stores the elements in a column-first fashion, that is storing the elements according to the dimensions of the array, for 2D:

 [1 3;
  2 4]

In contrast, Python, stores the elements in a row-first fashion, that is starting from the last dimension of the array:

[1 2;
 3 4];

So a block in memory with size [m,n,k] in Matlab is seen by python as an array of shape [k,n,m].

For more information see this wiki page.

BTW, instead of transposing train_set_x, you might try setting its order to "Fortran" order (col-major as in Matlab):

 train_set_x = np.array(train_set_x, order='F')

answered Sep 01 '16 at 06:47

Shai

111,146
38
238
371

Is the result different from transposing when you set `order='F'`? Or does it only make a difference in memory usage? – Ian Sep 01 '16 at 07:16
in `numpy` transposing is an O(1) operation: it does not re-locate elements in the memory, only hanges meta data of the array (its [`strides`](http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.strides.html)). I suppose you can compare the `strides` and `shape` between reading with `order="F"` and transposing. I guess these two methods amounts to the same object. – Shai Sep 01 '16 at 07:18
Thanks! It is good to know about the `order` argument, even if transposing yields the same result. – Ian Sep 01 '16 at 07:21
1

@mwormser I suppose it is more "correct" in this scenario to use the `order="F"` way, it makes it clear that the code expects the data to be in different order due to external program storage convention. – Shai Sep 01 '16 at 07:57
I tried the `order="F"` and transposing. But the output of `print(train_set_x.shape)` is different. It is `(50L, 1L, 51L, 61L, 23L)` for transposing, but `(23L, 61L, 51L, 1L, 50L)` for `order="F"`. – sha li Sep 01 '16 at 14:22

Read .mat file in Python. But the shape of the data changed

1 Answers1