How to save/extract dataset from hdf5 and convert into TiFF?

Question

I am trying to import CT scan data into ImageJ/FIJI (There is HDF5 plugin in ImageJ/Fiji, however the synchrotron CT data has so large datasets.. so it was failed to open). The scan data (Image dataset) is saved as dataset into the hdf5 file. So I have to extract image dataset from the hdf5 file, then converted it into the Tiff file.

HdF5 File path is "F:/New_ESRF/SNT_BTO4/SNT_BTO4_S1/SNT_BTO4_S1_1_1pag_db0005_vol.hdf5" Herein, 'SNT_BTO4_S1_1_1pag_db0005_vol.hdf5' is divided into several datasets, and the image dataset is in here:/entry0000/reconstruction/results/data

At the moment, I accessed to the image dataset using h5py. However, after that, I am in stuck to extract/save the dataset separately from the hdf5 file.

Which code is required to extract the image dataset from the hdf5 file?
After that, I am thinking of using from PIL to Image then convert the image into Tiff file. Can I get any advice on the code for this?

import numpy as np
import h5py
filename = "F:/New_ESRF/SNT_BTO4/SNT_BTO4_S1/SNT_BTO4_S1_1_1pag_db0005_vol.hdf5"

with h5py.File(filename,'r') as hdf:
base_items = list (hdf.items())
print('#Items in the base directory:', base_items)
    
#entry0000
G1 = hdf.get ('entry0000')
G1_items = list (G1.items())
print('#Items in entry0000', G1_items)
    
#reconstruction 
G11 = G1.get ('/entry0000/reconstruction')
G11_items = list (G11.items())
print('#Items in reconstruction', G11_items)
    
#results_data
G12 = G11.get ('/entry0000/reconstruction/results')
G12_items = list (G12.items())
print('#Items in results', G12_items)

What of this don't you understand, https://docs.h5py.org/en/stable/high/dataset.html#reading-writing-data — hpaulj, Dec 11 '21 at 20:55

score 0 · Answer 1 · answered Dec 12 '21 at 16:51

Extracting image data from an HDF5 file and converting to an image is a "relatively straight forward" 2 step process:

Access the data in the HDF5 file
Convert to an image with cv2 (or PIL)

A simple example is available here: How to extract individual JPEG images from a HDF5 file.

You can apply the same process to your file. Here is some pseudo-code. It's not complete because you don't show the shape of the image dataset (and the shape affects how to read the data). Also, you didn't say how many images are in dataset /entry0000/reconstruction/results/data --- does it have a single image or multiple images. If multiple images, which axis is the image counter?

import h5py
import cv2 ## for image conversion

filename = "F:/New_ESRF/SNT_BTO4/SNT_BTO4_S1/SNT_BTO4_S1_1_1pag_db0005_vol.hdf5"

with h5py.File(filename,'r') as hdf:     
    # get image dataset
    img_ds = hdf['/entry0000/reconstruction/results/data'] 
    print(f'Image Dataset info: Shape={img_ds.shape},Dtype={img_ds.dtype}')
    ## following depends on dataset shape/schema 
    ## code below assumes images are along axis=0
    for i in range(img_ds.shape[0]):
        cv2.imwrite(f'test_img_{i:03}.tiff',img_ds[i,:]) # uses slice notation
        # alternately load to a numpy array first
        img_arr = img_ds[i,:]   # slice notation gets [i,:,:,:]
        cv2.imwrite(f'test_img_{i:03}.tiff',img_arr)

Note: you don't need to use .get() to get a dataset. You can simply reference the dataset path. Also, when you use a group object, use the relative path from the dataset to the group, not the absolute path. (You should modify your code to reflect these changes.) For example, the following are equivalent

G1 = hdf['entry0000']  
## is the same as     G1 = hdf.get('entry0000')
G11 = hdf['entry0000/reconstruction']  
## is the same as     G11 = hdf.get('entry0000/reconstruction')
## OR referencing G1 group object:
G11 = G1['reconstruction']
## is the same as     G11 = G1.get('reconstruction')

Hi @kcw78 Thank you for your answer! When I print my image dataset, I got #Items in entry0000/reconstruction/results [('data', )] - Basically, the image data shape seems related to pixel size because I got the scan data by using detector with 2160 * 2560 pixels! — Yeajin Lee, Dec 12 '21 at 21:55
OK, that's a start. You're right, the 2560X2560 matches the image dimensions. However, a typical image array would have shape (2560,2560,3), where the third index represents the color (or gray scale) channels So, I'm not sure how to interpret the 2160. Is it the number of images -- if so where how is the channel data stored? At this point you need to investigate how that dataset was created and stored. Once you have that, you can recreate the images. — kcw78, Dec 13 '21 at 02:51
Thank you for your comments! I can confirm that 2160 is number of images. The image data is grey scale (not like normal jpg file with 3 RGB channels...). Regarding this, I couldnt much explain about how to find where channel data stored.. However, I just realised that I also could get some tiff image from your suggested code! The jupyter lab took longer time to run this code though (might be due to 2160 images...!). ##Can I just quickly ask what can be difference if we use i or ii (in this code you wrote: for i in range(img_ds.shape[0]):) ## — Yeajin Lee, Dec 13 '21 at 17:52
Do you want to rename variable `i` to `ii`? Sure, you can use any variable name you like, as long as it's consistent inside the loop. (I frequently use `cnt` or `img_cnt` b/c it's a little more descriptive.) — kcw78, Dec 13 '21 at 20:42

How to save/extract dataset from hdf5 and convert into TiFF?

1 Answers1