How can I read hdf5 files. and plot them as images

Question

I have this dataset for moon craters in hdf5 format https://zenodo.org/record/1133969/files/train_craters.hdf5?download=1 but I did not know how to read them and see the images inside this dataset

score 0 · Answer 1 · answered Sep 15 '21 at 13:30

For the part Reading a HDF5 file it's a duplicate question, I think, from this post : How to read HDF5 files in Python

For the plot part, I advise you to check matplotlib pyplot documentation and dig in to understand how it works

Documentation : https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.html

Tutorial : https://matplotlib.org/stable/tutorials/introductory/pyplot.html

kcw78 · Accepted Answer · 2021-09-15T21:18:41.083

HDF5 is a container of arbitrary data organized into groups and datasets (aka the data schema). To effectively work with the data, you need to understand the schema before you start coding. Ideally, the data source provides the schema. If not, your first step is deducing the schema. You can do this by opening the file and viewing with HDFView (from the HDF Group), or writing little code snippets as shown in the linked answer.

I looked at your file. You said you want to "see the images". You can't do that with this data. I read the file descriptions here: DeepMoon Supplemental Materials. There are 6 files of interest:

name_craters.hdf5 - Pandas HDFStore of crater locations and sizes for images in the dataset.
name_images.hdf5 - Input DEM images and output targets of the dataset, where:
- name = dev for the validation dataset
- name = test for the test dataset
- name = train for the training dataset

So, if you want the training image data you need to download the train_images.hdf5 file. Warning: it is 9.9 GB.

Comments about the train_craters.hdf5 file:
This file was created by Pandas. The file has 30_000 groups, 1 for each image (named "img_xxxxx"). Each group has 4 datasets named: "axis_0", "axis_1", "block0_items", and "block0_values". They have data about each image, but not any image data. For example, both "axis_0" and "block0_items" has the following entries:

Diameter (km)
Lat
Long
x
y
Diameter (pix)

There is data in "block0_values". Here is an example from "img_00000/block0_values":

[[ 5.32341731 -35.10135397 -101.80962272 161.77188631 252.6564721 10.87213217]  
 [ 5.38713978 -34.86402264 -102.38375512 132.62561605 237.8560143 11.00227398]]

From this you get:

Diameter (km)[0] = 5.32341731
Lat[0] = -35.10135397
Long[0] = -101.80962272
x[0] = 161.77188631
y[0] = 252.6564721
Diameter (pix)[0] = 10.87213217

Diameter (km)[1] = 5.38713978
Lat[1] = -34.86402264
Long[1] = -102.38375512
x[1] = 132.62561605
y[1] = 237.8560143
Diameter (pix)[1] = 11.00227398

So, that provides some basic info about each image...but not an array of pixel values you can covert into an image.

How can I read hdf5 files. and plot them as images

2 Answers2