2

I am having difficulty loading in 'str' variables 'Et' (Endtime) and 'St' (Starttime) from a MATLAB .mat file into Python.

I want identical output as in MATLAB. Instead I have had issues trying to solve this. See below for Python code and output.

# Import numpy and h5py to load in .mat files
import numpy as np
import h5py 

# Load in Matlab ('-v7.3') data
fname = 'directory/file.mat'
f = h5py.File(fname,'r') 

# create dictionary for data
data= {"average":np.array(f.get('average')),"median":np.array(f.get('median')), \
             "stdev":np.array(f.get('stdev')),"P10":np.array(f.get('p10')), \
             "P90":np.array(f.get('p90')),"St":np.str(f.get('stime')), \
             "Et":np.str(f.get('etime'))}
# All other variables are arrays

print(data["Et"])

output:

<HDF5 dataset "etime": shape (1, 6), type "<u4">

I want to have a string in python equal to the string in MATLAB. In other words, I want print(data["Et"]) = '01011212000000' which is the date and time.

How can I solve this?

An example of the data in MATLAB: example

UpperEastSide
  • 117
  • 1
  • 16
  • 1
    At least with Octave 'hdf5' file, `f['average']` has 2 datasets, 'type' and 'value'. It's a good idea to read both separately. For a string `type` is `b'sq_string'`, and `value` is a (n,1) array of 'int8' dtype. That could, I think be cast to a Python `bytestring`. There have been a few of SO questions that explore loading `hdf5` mat files, though I don't recall if any looked at strings. – hpaulj Feb 13 '19 at 01:36
  • 1
    https://stackoverflow.com/questions/41030188/reading-hdf5-format-matlab-file-in-python-with-h5py/43099856#43099856, https://stackoverflow.com/questions/37300974/opening-a-mat-file-using-h5py-and-convert-data-into-a-numpy-matrix/37305759#37305759, https://stackoverflow.com/questions/46044613/how-to-import-mat-v7-3-file-using-h5py/46045117#46045117 – hpaulj Feb 13 '19 at 01:40
  • 1
    What is `f.get('etime')`? Is it a group or a dataset? If a group, does it have any keys? – hpaulj Feb 13 '19 at 02:42
  • `f.get('etime')` brings up **** – UpperEastSide Feb 13 '19 at 06:04
  • 1
    Try `np.array(f.get('etime'))`. Load it as an array; we might be able to 'decode' it after, as I do in my `In[138]`. – hpaulj Feb 13 '19 at 06:27
  • np.array(f.get('etime')) = [[3707764736 2 1 1 2 1]] – UpperEastSide Feb 13 '19 at 23:48
  • 1
    Let's refine that `np.array(f.get('etime'), dtype=' – hpaulj Feb 13 '19 at 23:52
  • np.array(f.get('etime'), dtype=' – UpperEastSide Feb 13 '19 at 23:56
  • I get the following error using the bytes code below: **UnicodeDecodeError: 'utf-8' codec can't decode byte 0xdd in position 3: invalid continuation byte**. It is still a string and not a char at this point. I am going to try @machnic method now. – UpperEastSide Feb 14 '19 at 00:06

3 Answers3

2

If you don't mind the variable type of etime and stime stored in file.mat and you can store them as type char instead of string, you could read them in Python by: bytes(f.get(your_variable).value).decode('utf-8'). In your case:

data = {
    "average": np.array(f.get('average')),
    "median": np.array(f.get('median')),
    "stdev": np.array(f.get('stdev')),
    "P10": np.array(f.get('p10')),
    "P90": np.array(f.get('p90')),
    "St": bytes(f.get('stime')[:]).decode('utf-8'),
    "Et": bytes(f.get('etime')[:]).decode('utf-8')
}

I'm sure there is also a way of reading string type, but this might be the simplest solution.

machnic
  • 2,304
  • 2
  • 17
  • 21
1

In Octave

>> x = 1:10;
>> y = reshape(1:12, 3,4);
>> et = '0101121200000';
>> xt = 'a string';
>> save -hdf5 testh5.mat x y et xt

In a numpy session:

In [130]: f = h5py.File('testh5.mat','r')
In [131]: list(f.keys())
Out[131]: ['et', 'x', 'xt', 'y']
In [132]: list(f['y'].keys())
Out[132]: ['type', 'value']
In [133]: f['x/type'].value
Out[133]: b'range'
In [134]: f['y/type'].value
Out[134]: b'matrix'
In [135]: f['y/value'].value
Out[135]: 
array([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.],
       [ 7.,  8.,  9.],
       [10., 11., 12.]])
In [136]: f['et/type'].value
Out[136]: b'sq_string'
In [137]: f['et/value'].value
Out[137]: 
array([[48],
       [49],
       [48],
       [49],
       [49],
       [50],
       [49],
       [50],
       [48],
       [48],
       [48],
       [48],
       [48]], dtype=int8)
In [138]: f['et/value'].value.ravel().view('S13')
Out[138]: array([b'0101121200000'], dtype='|S13')
In [139]: f['xt/value'].value.ravel().view('S8')
Out[139]: array([b'a string'], dtype='|S8')
In [140]: f.close()

how to import .mat-v7.3 file using h5py

Opening a mat file using h5py and convert data into a numpy matrix

====

bytes also works in my file

In [220]: bytes(f['xt/value'].value)
Out[220]: b'a string'
In [221]: bytes(f['et/value'].value)
Out[221]: b'0101121200000'
hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • This doesn't work for me. When using `list(f['average'].keys())` I get the following error: **AttributeError: 'Dataset' object has no attribute 'keys'**. – UpperEastSide Feb 13 '19 at 02:22
  • 1
    OK, in my version, `f['average']` is a group with 2 datasets. Apparently in yours `f['average']` is the dataset itself. I don't have your file so can't explore it myself. – hpaulj Feb 13 '19 at 02:41
  • 'average' is a 9 x 365 matrix containing mostly NaNs with a few floats here and there. – UpperEastSide Feb 13 '19 at 02:53
  • 1
    Digging around I see there's a greater difference between MATLAB v7.3 and Octave's hdf5. Without a sample file I can't help. – hpaulj Feb 13 '19 at 03:17
  • @hpualj I have added an image of the data in MATLAB. I couldn't find a way to attach a .mat file – UpperEastSide Feb 13 '19 at 06:09
0

When I need to load .mat I use scipy and it works fine:

import scipy.io
mat = scipy.io.loadmat('fileName.mat')
Nimantha
  • 6,405
  • 6
  • 28
  • 69
André Pacheco
  • 1,780
  • 14
  • 19
  • 2
    Sounds like the OP has saved the .mat with the newer hdf5 mode, not a `loadmat` compatible one. – hpaulj Feb 13 '19 at 01:10
  • I cannot see any string variables when following this procedure. Output: `dict_keys(['__header__', '__version__', '__globals__', 'average', 'stdev', 'median', 'P90', 'P10', 'None', '__function_workspace__'])` – UpperEastSide Feb 13 '19 at 02:27
  • No Et or St. Note: don't worry about the NaNs - they are supposed to be. – UpperEastSide Feb 13 '19 at 02:28