I have a very large array of images (multiple GBs) and want to split it using numpy. This is my code:
images = ... # this is the very large array which contains a lot of images.
images.shape => (50000, 256, 256)
indices = ... # array containing ranges, that group the images array like [(0, 300), (301, 580), (581, 860), ...]
train_indices, test_indices = ... # both arrays contain indices like [1, 6, 8, 19] which determine which groups are in the train and which are in the test group
images_train, images_test = np.empty([0, images.shape[1], images.shape[2]]), np.empty([0, images.shape[1], images.shape[2]])
# assign the image groups to either train or test set
for (i, rng) in enumerate(indices):
group_range = range(rng[0], rng[1]+1)
if i in train_indices:
images_train = np.concatenate((images_train, images[group_range]))
else:
images_test = np.concatenate((images_test, images[group_range]))
The problem with this code is, that images_train
and images_test
are new arrays and the single images are always copied in this new array. This leads to double the memory needed to run the program.
Is there a way to split my images
array into images_train
and images_test
without having to copy the images, but rather reuse them?
My intention with the indices is to group the images into roughly 150 groups, where images from one group should be either in the train or test set