I have a large dataset of around 10,000 image imported from Google drive, and I wish to turn them into a numpy array so I can train my machine learning model. The problem is that my way is taking too long and is very space-consuming on the RAM.
from PIL import Image
import glob
train_images = glob.glob('/content/drive/MyDrive/AICW/trainy/train/*.jpg')
x_train = np.array([np.array(Image.open(image)) for image in train_images])
These lines of codes were still running even after 30 minutes and even when I managed to get a numpy array. It is a collection of images of different sizes and dimensions (eg some are 450 X 600 and others are 500 X 600), which is going to be problematic when I feed them into my model. There must be a way that's more time and space efficient right?
P.s I'm running all these on Google colab. The total number of images is 10,270. Size varies from image to image but they all generally have a size of 450 by 600 by 3.