Indexing a numpy array with a arrays

Question

I am trying to convert the vanilla python standard deviation function that takes n number of indexes defined by the variable number for calculations into numpy form. However the numpy code is faulty which is saying only integer scalar arrays can be converted to a scalar index is there any way i could by pass this.

Variables

import numpy as np
number = 5
list_= np.array([457.334015,424.440002,394.795990,408.903992,398.821014,402.152008,435.790985,423.204987,411.574005,
404.424988,399.519989,377.181000,375.467010,386.944000,383.614990,375.071991,359.511993,328.865997,
320.510010,330.079010,336.187012,352.940002,365.026001,361.562012,362.299011,378.549011,390.414001,
400.869995,394.773010,382.556000])

Vanilla python

std= np.array([list_[i:i+number].std() for i in range(0, len(list_)-number)])

Numpy form

counter = np.arange(0, len(list_)-number, 1)
std = list_[counter:counter+number].std()

You cannot uses a `numpy` array (result of `arange`) as a slice start or stop. `arr[1:10]` is ok, `arr[np.array([1,2,3]: np.array([4,5,6])` is not! What were you hoping it would produce? — hpaulj, Jan 12 '21 at 23:29

hpaulj · Accepted Answer · 2021-01-12T23:54:53.843

In [46]: std= np.array([arr[i:i+number].std() for i in range(0, len(arr)-number)
    ...: ])
In [47]: std
Out[47]: 
array([22.67653383, 10.3940773 , 14.60076482, 13.82801944, 13.68038469,
       12.54834004, 13.13574418, 15.24698722, 14.65383773, 11.62092989,
        8.57331689,  4.76392583,  9.49404494, 21.20874383, 24.91417226,
       20.84991841, 13.22152789, 10.83343482, 16.01294245, 13.80007894,
       10.51866421,  8.29287433, 11.24933733, 15.43661128, 13.65945978])

We can move the std out of the loop. Make a 2d array of windows, and apply std with axis:

In [48]: np.array([arr[i:i+number] for i in range(0, len(arr)-number)]).std(axis
    ...: =1)
Out[48]: 
array([22.67653383, 10.3940773 , 14.60076482, 13.82801944, 13.68038469,
       12.54834004, 13.13574418, 15.24698722, 14.65383773, 11.62092989,
        8.57331689,  4.76392583,  9.49404494, 21.20874383, 24.91417226,
       20.84991841, 13.22152789, 10.83343482, 16.01294245, 13.80007894,
       10.51866421,  8.29287433, 11.24933733, 15.43661128, 13.65945978])

We could also generate the windows with indexing. A convenient way is to use linspace:

In [63]: idx = np.arange(0,len(arr)-number)
In [64]: idx = np.linspace(idx,idx+number,number, endpoint=False,dtype=int)
In [65]: idx
Out[65]: 
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15,
        16, 17, 18, 19, 20, 21, 22, 23, 24],
         ...
       [ 4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
        20, 21, 22, 23, 24, 25, 26, 27, 28]])
In [66]: arr[idx].std(axis=0)
Out[66]: 
array([22.67653383, 10.3940773 , 14.60076482, 13.82801944, 13.68038469,
       12.54834004, 13.13574418, 15.24698722, 14.65383773, 11.62092989,
        8.57331689,  4.76392583,  9.49404494, 21.20874383, 24.91417226,
       20.84991841, 13.22152789, 10.83343482, 16.01294245, 13.80007894,
       10.51866421,  8.29287433, 11.24933733, 15.43661128, 13.65945978])

The rolling-windows using as_strided will probably be faster, but may be harder to understand.

In [67]: timeit std= np.array([arr[i:i+number].std() for i in range(0, len(arr)-
    ...: number)])
1.05 ms ± 7.01 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [68]: timeit np.array([arr[i:i+number] for i in range(0, len(arr)-number)]).s
    ...: td(axis=1)
74.7 µs ± 108 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [69]: %%timeit
    ...: idx = np.arange(0,len(arr)-number)
    ...: idx = np.linspace(idx,idx+number,number, endpoint=False,dtype=int)
    ...: arr[idx].std(axis=0)
117 µs ± 240 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [73]: timeit np.std(rolling_window(arr, 5), 1)
74.5 µs ± 625 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

using a more direct way to generate the rolling index:

In [81]: %%timeit
    ...: idx = np.arange(len(arr)-number)[:,None]+np.arange(number)
    ...: arr[idx].std(axis=1)
57.9 µs ± 87.5 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

your error

In [82]: arr[np.array([1,2,3]):np.array([4,5,6])]
Traceback (most recent call last):
  File "<ipython-input-82-3358e59f8fb5>", line 1, in <module>
    arr[np.array([1,2,3]):np.array([4,5,6])]
TypeError: only integer scalar arrays can be converted to a scalar index

thanks for the detailed explanation isnt there a where I can avoid the for loop I am trying to make my code run faster. — fire fireeyyy, Jan 13 '21 at 01:18
My bad I didnt mean to come off in a disrespectful way. I just dont rerally understand the `asstrided ` section — fire fireeyyy, Jan 13 '21 at 01:36
Simply moving `std` out of the loop gave better than 10x improvement. The `as_strided` version isn't fastest, so if you don't understand it, don't worry. The question of how to take multiple slices comes up fairly often. A simple slice is fast (a view), but multiple ones requires some sort of copy - advanced indexing or concatenate. — hpaulj, Jan 13 '21 at 01:46
Yea I thought I could implement the last one that takes `57.9 µs` to complete for my program that uses a very long list with the length of 2 million plus and it crashed with the error message of `Unable to allocate 14.2 GiB for an array with shape (2640651, 1440) and data type int32`. but thanks anyways man. — fire fireeyyy, Jan 13 '21 at 02:19
Yes, for larger arrays, iteration as you initially did may be necessary. Even if you don't have memory errors, there are time tradeoffs between memory management and iteration. You could also look into using `numba` to compile the task. — hpaulj, Jan 13 '21 at 05:55

score 0 · Answer 2 · answered Jan 12 '21 at 23:38

0

as taken from Rolling window for 1D arrays in Numpy?

def rolling_window(a, window):
    shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
    strides = a.strides + (a.strides[-1],)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

np.std(rolling_window(list_, 5), 1)

by the way, your vanilla python code is wrong. it should be:

std= np.array([list_[i:i+number].std() for i in range(0, len(list_)-number+1)])

answered Jan 12 '21 at 23:38

armamut

1,087
6
14

I am trying to run my code faster is there a way i could make the function without a for loop. – fire fireeyyy Jan 13 '21 at 01:23
use the code I wrote in the first box. It should give the results you need without a for loop. – armamut Jan 13 '21 at 06:40

Indexing a numpy array with a arrays

2 Answers2

your error