-1

This is the first time that I try to work with ".mat" files. I am going to use the data of a ".mat" file, but the elements of the arrays can not be opened. Can any one help me? Since the "*.mat" file is > 7.3, I can not use Scipy.io

 import numpy as np
 import h5py

 f = h5py.File('data.mat')
 for i in f.keys():
    aa = f[i]
    aa=np.array(aa)
    print i,':','\n',aa

When I use aa=np.array(aa)[0], the output would be the name of the f.key(), but I need the elements of the f.key()

hpaulj
  • 221,503
  • 14
  • 230
  • 353
Haj Nasser
  • 304
  • 2
  • 14
  • What do you get if you `print f[i]`? – mkrieger1 Jul 12 '19 at 20:41
  • I found a old SO topic about Matlab v7.3 files: [how-to-read-a-v7-3-mat-file-via-h5py](https://stackoverflow.com/questions/19310808) It explains how Matlab saves the data in a complex structure using "Object References". Read that answer (and links in it), for help. Working with objects is not simple. I answered a similar SO topic on the SVHN dataset here: [what-is-the-difference-between-the-two-ways-of-accessing-the-hdf5-group-in-svhn](https://stackoverflow.com/questions/55566865). It has step-by-step instructions that explain each h5py call. – kcw78 Jul 15 '19 at 14:15
  • Looks like the `f[i]` are `groups`. You need to look at the elements of those groups (their `keys`), and keep digging down until you get `datasets`. `datasets` can be loaded as `numpy` arrays. I'd suggest reading the `h5py` documentation, especially the parts about groups and datasets. – hpaulj Jul 16 '19 at 18:06

1 Answers1

0

As @hpaulj commented, you need to determine which objects are Groups and which are Datasets. And with Matlab datasets, you need to determine which are arrays and which are objects (objects point to other HDF5 objects in your file). Until you're comfortable with HDF5 and h5py, the easiest way to do this is with the HDFView utility from the HDF Group.

When you're ready to code, you can do it pragmatically with isinstance() referencing h5py objects.
To test if variable node is a Group use:

if isinstance(node, h5py.Group):

To test if variable node is a Dataset use:

if isinstance(node, h5py.Dataset):

To test if variable node is an Object Dataset use:

if (node.dtype == 'object') :

You can use visititems(-function-) to loop recursively down an object tree, calling -function- with each object.

Here's a very simple example to demonstrate. Put your filename in the place of foo.hdf5 and run. Warning: This creates a lot of output if you have a lot of groups and datasets. Once you understand your file schema, you should be able to access the datasets. If you find object datasets, read my linked answer to dereference them.

import numpy as np
import h5py

def visitor_func(name, node):
    if isinstance(node, h5py.Group):
        print(node.name, 'is a Group')
    elif isinstance(node, h5py.Dataset):
       if (node.dtype == 'object') :
            print (node.name, 'is an object Dataset')
       else:
            print(node.name, 'is a Dataset')   
    else:
        print(node.name, 'is an unknown type')         
#########    

print ('testing hdf5 matlab file')
h5f = h5py.File('foo.hdf5')

h5f.visititems(visitor_func)   

h5f.close()
kcw78
  • 7,131
  • 3
  • 12
  • 44