0

Is there anyway I could convert the standard deviation function to be computed just like the y_mean and xy_mean functions. I don't want to use a for loop for calculating the standard deviation or a function that takes a lot of RAM memory. I am trying to use np.convolve() function for calculating the standard deviation std.

variables:

number = 5
PC_list= np.array([457.334015,424.440002,394.795990,408.903992,398.821014,402.152008,435.790985,423.204987,411.574005,
404.424988,399.519989,377.181000,375.467010,386.944000,383.614990,375.071991,359.511993,328.865997,
320.510010,330.079010,336.187012,352.940002,365.026001,361.562012,362.299011,378.549011,390.414001,
400.869995,394.773010,382.556000])

Vanilla python functions:

y_mean = sum(PC_list[i:i+number])/number
xy_mean = sum([x * (i + 1) for i, x in enumerate(PC_list[i:i+number])])/number
std = (sum([(k - y_mean)**2 for k in PC_list[i:i+number]])/(number-1))**0.5

Numpy versions:

y_mean = (np.convolve(PC_list, np.ones(shape=(number)), mode='valid')/number)[:-1]
xy_mean = (np.convolve(PC_list, np.arange(number, 0, -1), mode='valid'))[:-1]
std = ?
tony selcuk
  • 709
  • 3
  • 11
  • by the way it's known as *list comprehension* – Moinuddin Quadri Jan 23 '21 at 19:00
  • https://numpy.org/doc/stable/reference/generated/numpy.std.html – paleonix Jan 23 '21 at 19:09
  • Before trying to answer this, it may be worth while scanning the OP's previous questions seeking the `std` moving windows like this. He's been trying, over a half dozen questions, to speed up an iterative calculation. – hpaulj Jan 23 '21 at 19:14

1 Answers1

1

You can use np.lib.stride_tricks.as_strided and np.std with ddof=1:

>>> np.std(
        np.lib.stride_tricks.as_strided(
            PC_list, 
            shape=(PC_list.shape[0] - number + 1, number), 
            strides=PC_list.strides*2
        ), 
        axis=-1, 
        ddof=1
    )
array([25.35313557, 11.6209317 , 16.32415133, 15.46019574, 15.29513506,
       14.02947067, 14.68620846, 17.04664993, 16.38348865, 12.9925946 ,
        9.58525968,  5.32623099, 10.61466493, 23.71209646, 27.85489139,
       23.31091745, 14.78211757, 12.11214834, 17.90301391, 15.42895731,
       11.7602241 ,  9.27171536, 12.57714149, 17.25865608, 15.2717403 ,
        9.02825105])

Otherwise you can move use pandas.Series.rolling.std, pandas.Series.dropna then pandas.Series.to_numpy:

>>> pd.Series(PC_list).rolling(number).std().dropna().to_numpy()
 
array([25.35313557, 11.6209317 , 16.32415133, 15.46019574, 15.29513506,
       14.02947067, 14.68620846, 17.04664993, 16.38348865, 12.9925946 ,
        9.58525968,  5.32623099, 10.61466493, 23.71209646, 27.85489139,
       23.31091745, 14.78211757, 12.11214834, 17.90301391, 15.42895731,
       11.7602241 ,  9.27171536, 12.57714149, 17.25865608, 15.2717403 ,
        9.02825105])

EXPLANATION: np.lib.stride_tricks.as_strided is used to reshape the array in a special way, that resembles rolling:

>>> np.lib.stride_tricks.as_strided(
            PC_list, 
            shape=(PC_list.shape[0] - number + 1, number), 
            strides=PC_list.strides*2
        )

array([[457.334015, 424.440002, 394.79599 , 408.903992, 398.821014],   #index: 0,1,2,3,4
       [424.440002, 394.79599 , 408.903992, 398.821014, 402.152008],   #index: 1,2,3,4,5
       [394.79599 , 408.903992, 398.821014, 402.152008, 435.790985],   #index: 2,3,4,5,6
       [408.903992, 398.821014, 402.152008, 435.790985, 423.204987],   # ... and so on
       [398.821014, 402.152008, 435.790985, 423.204987, 411.574005],
       [402.152008, 435.790985, 423.204987, 411.574005, 404.424988],
       [435.790985, 423.204987, 411.574005, 404.424988, 399.519989],
       [423.204987, 411.574005, 404.424988, 399.519989, 377.181   ],
       [411.574005, 404.424988, 399.519989, 377.181   , 375.46701 ],
       [404.424988, 399.519989, 377.181   , 375.46701 , 386.944   ],
       [399.519989, 377.181   , 375.46701 , 386.944   , 383.61499 ],
       [377.181   , 375.46701 , 386.944   , 383.61499 , 375.071991],
       [375.46701 , 386.944   , 383.61499 , 375.071991, 359.511993],
       [386.944   , 383.61499 , 375.071991, 359.511993, 328.865997],
       [383.61499 , 375.071991, 359.511993, 328.865997, 320.51001 ],
       [375.071991, 359.511993, 328.865997, 320.51001 , 330.07901 ],
       [359.511993, 328.865997, 320.51001 , 330.07901 , 336.187012],
       [328.865997, 320.51001 , 330.07901 , 336.187012, 352.940002],
       [320.51001 , 330.07901 , 336.187012, 352.940002, 365.026001],
       [330.07901 , 336.187012, 352.940002, 365.026001, 361.562012],
       [336.187012, 352.940002, 365.026001, 361.562012, 362.299011],
       [352.940002, 365.026001, 361.562012, 362.299011, 378.549011],
       [365.026001, 361.562012, 362.299011, 378.549011, 390.414001],
       [361.562012, 362.299011, 378.549011, 390.414001, 400.869995],
       [362.299011, 378.549011, 390.414001, 400.869995, 394.77301 ],
       [378.549011, 390.414001, 400.869995, 394.77301 , 382.556   ]])

Now if we take the std of the above array across the last axis, to obtain the rolling std. By default numpy uses ddof=0, i.e. Delta Degrees of Freedom = 0, which means for number amount of samples, the divisor will be equal to number - 0. Now as you want number - 1, you need ddof=1.

Sayandip Dutta
  • 15,602
  • 4
  • 23
  • 52
  • 1
    In previous questions, we have already suggested `as_strided`. While the result is a `view`, `std` does `X-X.mean()` as part of its calculation (as in his `(k - y_mean)**2`). That will make a copy, and blow up his memory. The OP isn't very good at explaining what he's learned from previous questions. https://stackoverflow.com/questions/65768068/memory-error-utilizing-numpy-arrays-python, https://stackoverflow.com/questions/65757073/optimizing-calculations-with-numpy-and-numba-python – hpaulj Jan 23 '21 at 20:48
  • @hpaulj I see. This is quite important bit of information that was left out. Would suggest OP to, at least, link his previous posts in the question, if it seems too hard to explain. – Sayandip Dutta Jan 23 '21 at 22:02
  • @SayandipDutta thank you the `pd.Series(PC_list).rolling(number).std().dropna().to_numpy()` function is outstanding, it was what I was looking for, could apply this to `xy_mean` and `y_mean`. – tony selcuk Jan 23 '21 at 22:07
  • @SayandipDutta I have successfully contributed the function to my `y_mean` and `std` however I couldn't find a way to implement it to `xy_mean`. I have made a post about it here [xy_mean issue](https://stackoverflow.com/questions/65866920/implementing-pandas-function-to-numpy-functions), if it could be abbreviated to that function as well. It would be great if you can take a look at that post would appreciate it. – tony selcuk Jan 24 '21 at 03:54