1

I am trying to convert the vanilla python standard deviation function that takes n number of indexes defined by the variable number for calculations into numpy form. However the numpy code is faulty which is saying only integer scalar arrays can be converted to a scalar index is there any way i could by pass this.

Variables

import numpy as np
number = 5
list_= np.array([457.334015,424.440002,394.795990,408.903992,398.821014,402.152008,435.790985,423.204987,411.574005,
404.424988,399.519989,377.181000,375.467010,386.944000,383.614990,375.071991,359.511993,328.865997,
320.510010,330.079010,336.187012,352.940002,365.026001,361.562012,362.299011,378.549011,390.414001,
400.869995,394.773010,382.556000])

Vanilla python

std= np.array([list_[i:i+number].std() for i in range(0, len(list_)-number)])

Numpy form

counter = np.arange(0, len(list_)-number, 1)
std = list_[counter:counter+number].std()
fire fireeyyy
  • 71
  • 1
  • 8
  • You cannot uses a `numpy` array (result of `arange`) as a slice start or stop. `arr[1:10]` is ok, `arr[np.array([1,2,3]: np.array([4,5,6])` is not! What were you hoping it would produce? – hpaulj Jan 12 '21 at 23:29

2 Answers2

1
In [46]: std= np.array([arr[i:i+number].std() for i in range(0, len(arr)-number)
    ...: ])
In [47]: std
Out[47]: 
array([22.67653383, 10.3940773 , 14.60076482, 13.82801944, 13.68038469,
       12.54834004, 13.13574418, 15.24698722, 14.65383773, 11.62092989,
        8.57331689,  4.76392583,  9.49404494, 21.20874383, 24.91417226,
       20.84991841, 13.22152789, 10.83343482, 16.01294245, 13.80007894,
       10.51866421,  8.29287433, 11.24933733, 15.43661128, 13.65945978])

We can move the std out of the loop. Make a 2d array of windows, and apply std with axis:

In [48]: np.array([arr[i:i+number] for i in range(0, len(arr)-number)]).std(axis
    ...: =1)
Out[48]: 
array([22.67653383, 10.3940773 , 14.60076482, 13.82801944, 13.68038469,
       12.54834004, 13.13574418, 15.24698722, 14.65383773, 11.62092989,
        8.57331689,  4.76392583,  9.49404494, 21.20874383, 24.91417226,
       20.84991841, 13.22152789, 10.83343482, 16.01294245, 13.80007894,
       10.51866421,  8.29287433, 11.24933733, 15.43661128, 13.65945978])

We could also generate the windows with indexing. A convenient way is to use linspace:

In [63]: idx = np.arange(0,len(arr)-number)
In [64]: idx = np.linspace(idx,idx+number,number, endpoint=False,dtype=int)
In [65]: idx
Out[65]: 
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15,
        16, 17, 18, 19, 20, 21, 22, 23, 24],
         ...
       [ 4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
        20, 21, 22, 23, 24, 25, 26, 27, 28]])
In [66]: arr[idx].std(axis=0)
Out[66]: 
array([22.67653383, 10.3940773 , 14.60076482, 13.82801944, 13.68038469,
       12.54834004, 13.13574418, 15.24698722, 14.65383773, 11.62092989,
        8.57331689,  4.76392583,  9.49404494, 21.20874383, 24.91417226,
       20.84991841, 13.22152789, 10.83343482, 16.01294245, 13.80007894,
       10.51866421,  8.29287433, 11.24933733, 15.43661128, 13.65945978])

The rolling-windows using as_strided will probably be faster, but may be harder to understand.

In [67]: timeit std= np.array([arr[i:i+number].std() for i in range(0, len(arr)-
    ...: number)])
1.05 ms ± 7.01 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [68]: timeit np.array([arr[i:i+number] for i in range(0, len(arr)-number)]).s
    ...: td(axis=1)
74.7 µs ± 108 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [69]: %%timeit
    ...: idx = np.arange(0,len(arr)-number)
    ...: idx = np.linspace(idx,idx+number,number, endpoint=False,dtype=int)
    ...: arr[idx].std(axis=0)
117 µs ± 240 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [73]: timeit np.std(rolling_window(arr, 5), 1)
74.5 µs ± 625 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

using a more direct way to generate the rolling index:

In [81]: %%timeit
    ...: idx = np.arange(len(arr)-number)[:,None]+np.arange(number)
    ...: arr[idx].std(axis=1)
57.9 µs ± 87.5 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

your error

In [82]: arr[np.array([1,2,3]):np.array([4,5,6])]
Traceback (most recent call last):
  File "<ipython-input-82-3358e59f8fb5>", line 1, in <module>
    arr[np.array([1,2,3]):np.array([4,5,6])]
TypeError: only integer scalar arrays can be converted to a scalar index
hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • thanks for the detailed explanation isnt there a where I can avoid the for loop I am trying to make my code run faster. – fire fireeyyy Jan 13 '21 at 01:18
  • My bad I didnt mean to come off in a disrespectful way. I just dont rerally understand the `asstrided ` section – fire fireeyyy Jan 13 '21 at 01:36
  • Simply moving `std` out of the loop gave better than 10x improvement. The `as_strided` version isn't fastest, so if you don't understand it, don't worry. The question of how to take multiple slices comes up fairly often. A simple slice is fast (a view), but multiple ones requires some sort of copy - advanced indexing or concatenate. – hpaulj Jan 13 '21 at 01:46
  • Yea I thought I could implement the last one that takes `57.9 µs` to complete for my program that uses a very long list with the length of 2 million plus and it crashed with the error message of `Unable to allocate 14.2 GiB for an array with shape (2640651, 1440) and data type int32`. but thanks anyways man. – fire fireeyyy Jan 13 '21 at 02:19
  • Yes, for larger arrays, iteration as you initially did may be necessary. Even if you don't have memory errors, there are time tradeoffs between memory management and iteration. You could also look into using `numba` to compile the task. – hpaulj Jan 13 '21 at 05:55
0

as taken from Rolling window for 1D arrays in Numpy?

def rolling_window(a, window):
    shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
    strides = a.strides + (a.strides[-1],)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

np.std(rolling_window(list_, 5), 1)

by the way, your vanilla python code is wrong. it should be:

std= np.array([list_[i:i+number].std() for i in range(0, len(list_)-number+1)])
armamut
  • 1,087
  • 6
  • 14