1

I'm working with a lot of arrays, and I'd like to know if there's a way to use aliasing so that operations using a subset of the array do not need to "reslice" the array each time the global array is updated.

For example:

values  = np.array([100, 200, 300, 400, 500, 600, 700, 800, 900, 1000])
index   = np.array([2, 4, 8, 9])
sub_val = values[index]

This returns the following for sub_val:

sub_val = [300 500 900 1000]

If I change the original array:

values += 1

sub_val still returns:

sub_val = [300 500 900 1000]

instead of the desired:

sub_val = [301 501 901 1001]

Based on this, I'm assuming that all index/slice operations are creating a shallow copy. Is there a way to instead have sub_val be an alias of that subset of the array?

The goal is to be able to do this as efficiently as possible (The subset matrices are used for thousands of iterations).

stagermane
  • 1,003
  • 2
  • 12
  • 29

1 Answers1

0

The values of an array are stored in a data buffer, but you never access that directly. The array's indexing methods take care of that, using its dtype, shape, and strides.

A view has its own shape and strides, but shares the data buffer (possibly starting at a different point in the buffer). Thus it 'sees' changes that have been made to the original.

But your 'random' indexing can't be expressed in terms of shape and strides. So numpy has to make a copy - an array with its own data buffer (values copied from the original). And that copy does not retain any record of how it was created (i.e. it does not store the index).

The suggest duplicate had answers that copied values from the original, and copied them back, but that doesn't save on any of the transaction costs that you fear.

My suggestion is to not worry about this indexing cost. Assume numpy is performing this step as efficiently as possible. Focus instead on reducing those thousands of iterations.

hpaulj
  • 221,503
  • 14
  • 230
  • 353