0

I have created a hdf5 file in Matlab with a Matrix size of (1 x 19,000,000). The file had a size of 150 megabytes.

  1. My question is on how to find the perfect chunk size and deflate ratio? After playing around I have discovered that a chunk size of 1 x 1,000,000 with deflate set to 7 achieves a file of 100 megabytes.

  2. My second problem is that I am unable to import this file in Python

Matlab

h5create('Xn.h5','/rawdata',size (data),'ChunkSize',[1 1000000],'Deflate',7 )

Python

import h5py
filename = 'Xn.h5'
f = h5py.File(filename, 'r')

print("Keys: %s" % f.keys())

I expected that Python will handle the data smoothly just as matlab but this never happened

Xukrao
  • 8,003
  • 5
  • 26
  • 52
  • A hdf5 file of 150 mb is certainly not huge. You should not need to worry about compression at all in this regime. Could you post the error message you get when attempting to read it in python. – Florian Drawitsch Feb 11 '19 at 10:09
  • @FlorianDrawitsch, thanks for your comment. I am not getting an error but I am also not able to read the the dataset inside my hdf file. My plan is to use the data and plot it and python os running in the backhround without error or data – Youssef Yassine Feb 11 '19 at 10:31
  • If you are not getting an error, what makes you think the dataset is not read properly then? What is returned for e.g. `f[list(f.keys())[0]]` – Florian Drawitsch Feb 11 '19 at 11:54
  • @FlorianDrawitsch, executing `a_group_key = list(f.keys())[0] ;data1 = list (f1[a_group_key])` takes 2 hours – Youssef Yassine Feb 11 '19 at 11:59
  • Are you sure you have written the file properly? Try to create it ommitting the chunk size and deflate parameters. Also, please share your `h5write` statement you are executing after your `h5create` statement. – Florian Drawitsch Feb 11 '19 at 12:10
  • Please also note that the `data1 = list (f1[a_group_key])` command you issue converts the returned data into a list. Please execute exactly what I suggested: `f[list(f.keys())[0]]` – Florian Drawitsch Feb 11 '19 at 12:18
  • Matlab's HDF5 output relies on HDF5 references. Have a look at https://stackoverflow.com/a/46797169/3327666 – Pierre de Buyl Feb 12 '19 at 14:27
  • @FlorianDrawitsch to walk you through my logic first i made an HDF5 file in Matlab and the size was 150 mbs. Later on I played with the chunksize and deflate to make the file smaller and I executed the code above. So the original Matlab code looks like this: `h5create(hd5_file_name_full, '/rawdata', size(sensor_data)); h5write(hd5_file_name_full , '/rawdata', sensor_data);` – Youssef Yassine Feb 13 '19 at 08:46
  • @FlorianDrawitsch as for the python command: `f[list(f.keys())[0]]`´; returns _HDF5 dataset "rawdata": shape (19410432, 1), type "_ – Youssef Yassine Feb 13 '19 at 08:48
  • @Youssef: Here you have it. That is your dataset. To retrieve the data simply execute `dataset = f[list(f.keys())[0]]` followed by e.g. `data = dataset[0:19410432]` – Florian Drawitsch Feb 13 '19 at 10:26
  • @FlorianDrawitsch is it possible to know what attributes where written inside this hdf5? – Youssef Yassine Feb 13 '19 at 12:12

1 Answers1

0

Ok as it seems to turn out, this question is more related to "How do I access my data in a hdf5 container in python?".

You find a very good quick start guide here.

The process of accessing your data works like this:

import h5py
f = h5py.File(filename, 'r') 
key = list(f.keys())[0]
dataset = f[key]

# To retrieve e.g. the first 10 elements of a 1D dataset execute
data = dataset[0:9]