1

Python Code:

import h5py
import hdf5storage
from functools import reduce
import numpy as np
from operator import mul

sz = 128,256,512
a = np.random.normal(size=reduce(mul,sz)).reshape(sz)
save_dict = {'data':a}

spath = r"test.mat"
hdf5storage.savemat(spath, mdict=save_dict, append_mat=False, 
                    store_python_metadata=True, format='7.3')

with h5py.File(spath, 'r') as file:
    b = np.array(file['data'])

# Reads in the correct shape, but is F-contiguous. Scipy doesn't work with v7.3 files.
c = hdf5storage.loadmat(spath)['data']

When a is created, it has a shape (128,256,512). However, when I save a to the .mat file using hdf5storage, and then load it into b using h5py, b is transposed as has a shape of (512,256,128). Both arrays are C-contiguous when checking their flags.

Is there any way to prevent this transpose from happening? I was under the impression that hdf5 format saves row-major.

wolfblade87
  • 173
  • 10
  • The MATLAB convention is F-contiguous. You can see that in the arrays loaded via `scipy.io.loadmat`. The `HDF5` convention is C-contiguous. I've just looked a bit at the HDF5 files produced by MATLAB/Octave (using `h5py`). I don't know what kinds of games `hdf5storage` is playing with these conventions. https://stackoverflow.com/questions/46044613/how-to-import-mat-v7-3-file-using-h5py – hpaulj Sep 05 '18 at 22:47
  • Unfortunately I don't know of anything else that supports v7.3 .mat files. When trying to use scipy, an error pops up saying to using h5py for that types of files. And I have colleagues that use MATLAB only, so I'm trying to create data files (.mat) that we can pass back and forth easily. – wolfblade87 Sep 05 '18 at 22:52
  • As long as you are interacting with MATLAB you'll have to deal with order differences and transposes. In numpy the first axis is the outermost. In MATLAB it's the last. – hpaulj Sep 05 '18 at 23:26

1 Answers1

2

I looked again at the abc.h5 file described in:

how to import .mat-v7.3 file using h5py

It was created in Octave with:

>> A = [1,2,3;4,5,6];
>> B = [1,2,3,4];
>> save -hdf5 abc.h5 A B

Using h5py:

In [102]: f = h5py.File('abc.h5','r')
In [103]: A = f['A']['value'][:]
In [104]: A
Out[104]: 
array([[1., 4.],
       [2., 5.],
       [3., 6.]])
In [105]: A.shape
Out[105]: (3, 2)
In [106]: A.flags
Out[106]: 
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  ...
In [107]: A.ravel()
Out[107]: array([1., 4., 2., 5., 3., 6.])

So it's a transposed C order array. Apparently that's how MATLAB developers have chosen to store their matrices in HDF5.

I could tranpose it in numpy:

In [108]: At = A.T
In [109]: At
Out[109]: 
array([[1., 2., 3.],
       [4., 5., 6.]])
In [110]: At.flags
Out[110]: 
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  ....

As is normal, a C-order array becomes F-order when transposed.

The Octave matrices saved with the older .mat format

In [115]: data = io.loadmat('../abc.mat')
In [116]: data['A']
Out[116]: 
array([[1., 2., 3.],
       [4., 5., 6.]])
In [117]: _.flags
Out[117]: 
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True

So the h5py array, transposed, matches the convention that io.loadmat has been using for quite some time.

I don't have hdf5storage installed on this OS. But by your tests, it is following the io.loadmat convention - correct shape but F order.

hpaulj
  • 221,503
  • 14
  • 230
  • 353