Aliasing parts of numpy arrays

Question

I'm working with a lot of arrays, and I'd like to know if there's a way to use aliasing so that operations using a subset of the array do not need to "reslice" the array each time the global array is updated.

For example:

values  = np.array([100, 200, 300, 400, 500, 600, 700, 800, 900, 1000])
index   = np.array([2, 4, 8, 9])
sub_val = values[index]

This returns the following for sub_val:

sub_val = [300 500 900 1000]

If I change the original array:

values += 1

sub_val still returns:

sub_val = [300 500 900 1000]

instead of the desired:

sub_val = [301 501 901 1001]

Based on this, I'm assuming that all index/slice operations are creating a shallow copy. Is there a way to instead have sub_val be an alias of that subset of the array?

The goal is to be able to do this as efficiently as possible (The subset matrices are used for thousands of iterations).

I suppose you can create your own slice_view class to get that behaviour. — Gonmator, Nov 21 '16 at 09:42
Not with that index. Review docs on views v copies and basic v advanced indexing. — hpaulj, Nov 21 '16 at 09:54
*basic* slicing returns a view and should be faster than *advanced* indexing — RomanPerekhrest, Nov 21 '16 at 10:00
Is it possible to do basic slicing when the index values are non consecutive and may not have a standard step (not in the format, start: stop: interval)? — stagermane, Nov 21 '16 at 10:07
I would using basic slicing if I could, but my understanding is that basic slicing will not work if the slices are "random", i.e. non consecutive or not in specified intervals. — stagermane, Nov 21 '16 at 10:34
There isn't such a thing as `shallow copy` of a numpy array. It's either a view or a copy. — hpaulj, Nov 21 '16 at 10:44

score 0 · Accepted Answer · answered Nov 21 '16 at 23:05

The values of an array are stored in a data buffer, but you never access that directly. The array's indexing methods take care of that, using its dtype, shape, and strides.

A view has its own shape and strides, but shares the data buffer (possibly starting at a different point in the buffer). Thus it 'sees' changes that have been made to the original.

But your 'random' indexing can't be expressed in terms of shape and strides. So numpy has to make a copy - an array with its own data buffer (values copied from the original). And that copy does not retain any record of how it was created (i.e. it does not store the index).

The suggest duplicate had answers that copied values from the original, and copied them back, but that doesn't save on any of the transaction costs that you fear.

My suggestion is to not worry about this indexing cost. Assume numpy is performing this step as efficiently as possible. Focus instead on reducing those thousands of iterations.

Aliasing parts of numpy arrays

1 Answers1