Numpy split array without copying

Question

I have a very large array of images (multiple GBs) and want to split it using numpy. This is my code:

images = ... # this is the very large array which contains a lot of images.
images.shape => (50000, 256, 256)

indices = ... # array containing ranges, that group the images array like [(0, 300), (301, 580), (581, 860), ...]

train_indices, test_indices = ... # both arrays contain indices like [1, 6, 8, 19] which determine which groups are in the train and which are in the test group

images_train, images_test = np.empty([0, images.shape[1], images.shape[2]]), np.empty([0, images.shape[1], images.shape[2]])

# assign the image groups to either train or test set
for (i, rng) in enumerate(indices):
    group_range = range(rng[0], rng[1]+1)
    if i in train_indices:
        images_train = np.concatenate((images_train, images[group_range]))
    else:
        images_test = np.concatenate((images_test, images[group_range]))

The problem with this code is, that images_train and images_test are new arrays and the single images are always copied in this new array. This leads to double the memory needed to run the program.

Is there a way to split my images array into images_train and images_test without having to copy the images, but rather reuse them?

My intention with the indices is to group the images into roughly 150 groups, where images from one group should be either in the train or test set

No, the ranges vary. So `indices` is more like `[(0, 300), (301, 578), (579, 850), ...]` — Codey, Jun 10 '20 at 14:16
Individual slices will be `views`, but concatenating them creates a copy. — hpaulj, Jun 10 '20 at 14:36
Yes I've read that from here https://scipy-cookbook.readthedocs.io/items/ViewsVsCopies.html. But is there an approach that could help me? — Codey, Jun 10 '20 at 15:03
Since you are trying to concatenate rather than copying, the question have already been asked here: https://stackoverflow.com/questions/7869095/concatenate-numpy-arrays-without-copying — SeF, Jun 10 '20 at 17:31

score 1 · Answer 1 · answered Jun 10 '20 at 17:14

Without a running code it's difficult to understand the details. But I can try to give some ideas. If you have images_train and images_test then you will probabely use them to train and to test with a command that is something like

.fit(images_train);
.score(images_test)

An approach might be that you do not build images_train and images_test but that you use part of images directely

.fit(images[...]);
.score(images[...])

Now the question is, what should be in the [...]-brackets ? Or is there a numpy operater that extracts the right images[...]. First we have to think about what we should avoid:

for loop is always slow
iterative filling of an array like A = np.concatenate((A, B[j])) is always slow
Python's "fancy indexing" is always slow, as group_range = range(rng[0], rng[1]+1); images[group_range]

Some ideas:

use slices instead of "fancy indexing" see here
images[rng[0] : rng[1]+1] , or
group_range = slice(rng[0] , rng[1]+1); images[group_range]
Is images_train = images[train_indices, :, :] and images_test = images[test_indices, :, :] ?
images.shape => (50000, 256, 256) is 3-dimensional ?
try wether numpy.where can give some assitance
below the methods I've mentioned

...

import numpy as np

 A = np.arange(20); print("A =",A)
 B = A[5:16:2]; print("B =",B)                   # view of A only, faster
 j = slice(5, 16, 2); C = A[j]; print("C =",C)   # view of A only, faster
 k = [2, 4, 8, 12]; D = A[k]; print("D =",D)    # generates internal copies

 A = [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]
 B = [ 5  7  9 11 13 15]
 C = [ 5  7  9 11 13 15]
 D = [ 2  4  8 12]

Numpy split array without copying

1 Answers1