I'm having trouble loading a lot of small image files (aprox. 90k png images) into a single 3D np.array
.
The current solution takes couple of hours which is unacceptable.
The images are size 64x128.
I have a pd.DataFrame
called labels
with the names of the images and want to import whose images in the same order as in the labels
variable.
My current solution is:
dataset = np.empty([1, 64, 128], dtype=np.int32)
for file_name in labels['file_name']:
array = cv.imread(f'{IMAGES_PATH}/{file_name}.png', cv.COLOR_BGR2GRAY)
dataset = np.append(dataset, [array[:]], axis=0)
From what I have timed, the most time consuming operation is the dataset = np.append(dataset, [array[:]], axis=0)
, which takes around 0.4s per image.
Is there any better way to import such files and store them in a np.array
?
I was thinking about multiprocessing, but I want the labels
and dataset
to be in the same order.