Reshape(-1) images in a h5 dataset

Question

I am taking a Pattern recognition subject in this semester. I have a project to do face detection system from 3000++ images. I am using python for this project.

What I have done so far is convert the image into numpy array and store inside a list using code below:

 # convert to numpy array, then grayscale, then resize, then vectorize, finally store in 
 # a list

 for file in sorted(img_path):
    img = cv2.imread(file)
    img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    img_gray = cv2.resize(img_gray, dsize=(150, 150), interpolation=cv2.INTER_CUBIC)
    img_gray = img_gray.reshape(-1)
    imagesData.append(img_gray)

# save to .h5 file, not yet do for label dataset

hf = h5py.File(save_path, 'a')
dset = hf.create_dataset('dataset',data=imagesData)
hf.close()

There is a small question here, is reshape(-1) mean vectorize? I try imagesData.shape, it print out (22500,), originally (150,150)

print(imagesData[0].shape)

The images are from a google drive folder(consisit of .png image). I am using sorted in looping because I want to arrange and store the numpy array in list from first to last images (1223 - 5222). Why I do this because I was given a text file containing some features that arranged from (1223-5222) and I going to store both dataset (imagesData) and label datasets (features) inside a .h5 file. The features text file as below:

text file

Am I right? because after store both dataset and label datasets into .h5 file, I will load them out and start some machine algorithm for my project, so I have to make sure each row of sample match correct label.

numpy reshape(-1) is explained in this SO Q&A: [What does -1 mean in numpy reshape?](https://stackoverflow.com/q/18691084/10462884) In short the documentation says, _`the new shape should be compatible with the original shape`_. The original array `img_gray ` is (150,150). You are reshaping it from a 2D array to 1D, so get an array of shape `(150*150,) == (22500,)`. Personally, I **would not reshape the image arrays**. It can only lead to problems. — kcw78, Mar 14 '22 at 13:57
What you want to do is _not related to reshaping the image arrays_. You need to match the image order in `imagesData` to feature order in your text file. I would read the text file first, then use the feature number to read the images in the same order. If images and features are both ordered sequentially (`image1223, image_1224...image_5222`, etc) you can take a shortcut...but be very careful. — kcw78, Mar 14 '22 at 14:08
@kcw78, I reshaped because my lecturer told me to store as 1D array into h5 file. Ya, I want my images and features both in order, to ease me for doing machine learning tasks. — lee en, Mar 14 '22 at 14:23
Interesting that "_your lecturer told me to store as 1D array into h5 file_". I wonder why they would say that? It's not wrong, but there is no advantage to it. ML algorithms don't need it. They can work with 2D (&3D) image data. And, if you want to plot the image from the .h5 file, you have to remember to return it to the original shape. However, if the lecturer said to do it, I guess you better do it. :-) — kcw78, Mar 14 '22 at 14:59

Reshape(-1) images in a h5 dataset

0 Answers0