0

I have a (30000, 25, 25) dimension hdf file and have already converted it into a numpy array with the code below:

import numpy as np
import h5py

hf = h5py.File('data.h5', 'r')
n1 = np.array(hf["image"][:]) 
x = n1[0:625:30000]
print(x)

In hdfview, after changing its dimensions, I was able to create 30000 individual 25 X 25 arrays. However, with the code above, I am only able to open the first array. The code below is able to show the first array:

import numpy as np
import h5py

hf = h5py.File('data.h5', 'r')
n1 = np.array(hf["image"][:]) 
x[0] = n1[0:625:30000]
print(x)

When I change x[0] to x[1] or anything higher it says - "index 1 is out of bounds for axis 0 with size 1." Is there a solution to output 30000 of these 25 X 25 arrays demonstrated in hdfview?

kdf
  • 358
  • 4
  • 16

2 Answers2

0
n1 = hf["image"][:]

is enough. It is a numpy array. No need to wrap it in np.array(...) again.

x = n1[0:625:30000] doesn't make sense. In Python, slicing is [start:stop:step].

x = n1[::625] will return a subset, ever 625th element (on the first dimension).

'x = n1[0]is the first (25,25) block (or image).x = n1[1]` the second.

In numpy indexing n1[0] is the equivalent of n1[0, :, :], selecting an item on the 1st dimension.

I have a feeling you've jumped into processing these images (for machine learning or something like that), without learning much python or numpy. If you are just following a tutorial that might work, but you'll get lost if you stray off that guided path.

This other h5py SO post from the same time as your first is relevant:

Efficient way of serializing and retrieving a large number of numpy arrays

hpaulj
  • 221,503
  • 14
  • 230
  • 353
0

Sorry guys, it looks like I just over complicated it by adding unnecessary slicing. I am slightly unfamiliar with numpy so I'll learn more about it to avoid problems. Thank you all once again for you time.

kdf
  • 358
  • 4
  • 16