7

I am trying to verify the ewm.std calculations of pandas so that I can implement a one step update for my code. Here is the complete description of the problem with code.

mrt = pd.Series(np.random.randn(1000))
N = 100
a = 2/(1+N)
bias = (2-a)/2/(1-a)
x = mrt.iloc[-2]
ma = mrt.ewm(span=N).mean().iloc[-3]
var = mrt.ewm(span=N).var().iloc[-3]
ans = mrt.ewm(span=N).std().iloc[-2]
print(np.sqrt( bias*(1-a) * (var + a * (x- ma)**2)), ans)

(1.1352524643949702, 1.1436193844674576)

I have used standard formulation. Could somebody tell me why the two values should not be same? i.e. how is pandas calculating the exponentially weighted std?

EDIT: After Julien's answer - let me give one more use case. I am plotting the ratio of the var calculated by pandas and using the formula I inferred from the Cython code of pandas ewm-covariance. This ratio should be 1. (I am guessing there is a problem with my formula, if somebody can point it out).

mrt = pd.Series(np.random.randn(1000))

N = 100
a = 2./(1+N)
bias = (2-a)/2./(1-a)
mewma = mrt.ewm(span=N).mean()

var_pandas = mrt.ewm(span=N).var()
var_calculated = bias * (1-a) * (var_pandas.shift(1) + a * (mrt-mewma.shift(1))**2)

(var_calculated/var_pandas).plot()

The plot shows the problem clearly.

plot of the ratio after the initial values are removed

EDIT 2: By trial and error, I figured out the right formula:

var_calculated = (1-a) * (var_pandas.shift(1) + bias * a * (mrt-mewma.shift(1))**2)

But I am not convinced that it should be the right one! Can somebody put light on that?

halfer
  • 19,824
  • 17
  • 99
  • 186
manav
  • 217
  • 1
  • 2
  • 11
  • 1
    potential duplicate of [this question](http://stackoverflow.com/questions/37924377/does-pandas-calculate-ewm-wrong)? – Julien Marrec Nov 23 '16 at 00:56
  • @JulienMarrec No. I can verify the ewma exactly. I am getting trumped at ewmstd. – manav Nov 23 '16 at 01:00
  • ewm is defined here: [window.py#L1387](https://github.com/pandas-dev/pandas/blob/master/pandas/core/window.py#L1387). ewm.std call is [here](https://github.com/pandas-dev/pandas/blob/master/pandas/core/window.py#L1555).- and then goes to _zqrst [here](https://github.com/pandas-dev/pandas/blob/master/pandas/core/window.py#L1761) – Julien Marrec Nov 23 '16 at 01:04
  • Yup! I looked at the cython code as well. That is where I derived the bias term from. There can be some difference due to how pandas implements it, but the order of difference here is much more than that would imply. – manav Nov 23 '16 at 01:09
  • There are (broken) equations in the [doc](http://pandas.pydata.org/pandas-docs/stable/computation.html#exponentially-weighted-windows). I assume you've read that too. – Julien Marrec Nov 23 '16 at 01:15
  • 1
    Yes I did read all of it. Somebody would be able to use my example to find where I went wrong, is my hope. It's a simple thing but is so baffling that I can't get it exactly right! – manav Nov 23 '16 at 01:26
  • isn't bias=False by default? (Away from computer) – Julien Marrec Nov 23 '16 at 01:29
  • bias toggle will not affect things at second decimal place for longer series. I have tried those things as well to exclude that as the cause of the order of error I show here in the example. There is something more fundamentally wrong in one of the calculations, most probably mine. – manav Nov 23 '16 at 01:33
  • I am trying to find the ewm.std formula as well. I copy-pasted your code (with your edited calculation) and got [this](https://i.imgur.com/7CDGgxG.png) result. Definitely not the "exact match" that you mention below. Has something changed in a recent version of Pandas? – 0x5453 Oct 18 '17 at 17:50

3 Answers3

7

According to the documentation of the ewm function, the default flag adjust=True is used. As it is explained in the links below, the exponentially weighted moving values are not computed using the recursive relations, but instead using weights. This is justified, particularly for the cases when the length of the series is small.

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.ewm.html https://github.com/pandas-dev/pandas/issues/8861

This means that the ewma and ewmvar are computed as normal weighted average and var with the weights being the exponentially decreasing factors

mrt_array = np.array(mrt.tolist())
M = len(mrt_array)
weights = (1-a)**np.arange(M-1, -1, -1) # This is reverse order to match Series order
ewma = sum(weights * mrt_array) / sum(weights)
bias = sum(weights)**2 / (sum(weights)**2 - sum(weights**2))
ewmvar = bias * sum(weights * (mrt_array - ewma)**2) / sum(weights)
ewmstd = np.sqrt(ewmvar)
kosnik
  • 2,342
  • 10
  • 23
  • 1
    Thank you. Used this code to answer similar question [here](https://stackoverflow.com/questions/58809786/pandas-ewm-var-and-std/62434023#62434023). upvoted. – Nilesh Ingle Jun 17 '20 at 16:42
  • How do you derive this formula theoretically? Thanks! – GabCaz Aug 31 '23 at 14:09
5

Your question actually actually reduces to how pandas calculate ewm.var()

In [1]:
(np.sqrt(mrt.ewm(span=span).var()) == mrt.ewm(span=span).std())[1:].value_counts()

Out[1]:
True    999
dtype: int64

So in your example above: ans == np.sqrt(mrt.ewm(span=N).var().iloc[-2]).

To investigate how it calculates ewmvar(), it does it by calling emcov with input_x=input_y=mrt


If we check for the first elements:

mrt.ewm(span=span).var()[:2].values
> array([nan,  0.00555309])

Now, using the emcov routine, and applying it to our specific case:

x0 = mrt.iloc[0]
x1 = mrt.iloc[1]
x2 = mrt.iloc[2]

# mean_x and mean_y are both the same, here we call it y
# This is the same as mrt.ewm(span=span).mean(), I verified that too
y0 = x0
# y1 = mrt.ewm(span=span).mean().iloc[1]
y1 = ((1-alpha)*y0 + x1)/(1+(1-alpha))
#y2 = (((1-alpha)**2+(1-alpha))*y1 + x2) / (1 + (1-alpha) + (1-alpha)**2) 

cov0 = 0

cov1 = (((1-alpha) * (cov0 + ((y0 - y1)**2))) +
                (1 * ((x1 - y1)**2))) / (1 + (1-alpha))

# new_wt = 1, sum_wt0 = (1-alpha), sum_wt2 = (1-alpha)**2
sum_wt = 1+(1-alpha)
sum_wt2 =1+(1-alpha)**2


numerator = sum_wt * sum_wt # (1+(1-alpha))^2 = 1 + 2(1-alpha) + (1-alpha)^2
denominator = numerator - sum_wt2 # # 2*(1-alpha)


print(np.nan,cov1*(numerator / denominator))

>(nan, 0.0055530905712123432)
Julien Marrec
  • 11,605
  • 4
  • 46
  • 63
  • Thanks Julien. I think what you did is fine. Can you confirm if you get the same equality for the last step? Please see the edit above in the question. – manav Nov 24 '16 at 00:21
  • Yes, confirmed. – Julien Marrec Nov 24 '16 at 10:47
  • Thanks! So there is some trouble with my formula then. By trial and error I found that the correct formula should be var_calculated = (1-a) * (var_pandas.shift(1) + bias * a * (mrt-mewma.shift(1))**2). This gives exact match! But I am not sure why :) Thanks for all the trouble @Julien – manav Nov 24 '16 at 17:28
  • Glad you figured it out. If you could at least upvote if that was helpful I'd appreciate it – Julien Marrec Nov 24 '16 at 18:45
  • 1
    I did. but it doesn't show up because I have less than 15 reputation! – manav Nov 24 '16 at 23:36
  • @ma327: you should be able to accept this answer, by clicking on the tick/check mark to the left. – halfer Nov 27 '16 at 13:52
1

Thank you @kosnik for the answer above. Copy-pasted your code below and build it up on it to answer the question here. Thus calculated exponential moving variance and standard deviation for entire dataset. The calculated values match with the output of .ewm().

# Import libraries
import numpy as np
import pandas as pd

# Create DataFrame
mrt = pd.Series(np.random.randn(1000))
df = pd.DataFrame(data=mrt, columns=['data'])


# Initialize 
N = 3 # Span
a = 2./(1+N) # Alpha

# Use .evm() to calculate 'exponential moving variance' directly
var_pandas = df.ewm(span=N).var()
std_pandas = df.ewm(span=N).std()

# Initialize variable
varcalc=[]
stdcalc=[]

# Calculate exponential moving variance
for i in range(0,len(df.data)):

    z = np.array(df.data.iloc[0:i+1].tolist())

    # Get weights: w
    n = len(z)
    w = (1-a)**np.arange(n-1, -1, -1) # This is reverse order to match Series order

    # Calculate exponential moving average
    ewma = np.sum(w * z) / np.sum(w)

    # Calculate bias
    bias = np.sum(w)**2 / (np.sum(w)**2 - np.sum(w**2))

    # Calculate exponential moving variance with bias
    ewmvar = bias * np.sum(w * (z - ewma)**2) / np.sum(w)

    # Calculate standard deviation
    ewmstd = np.sqrt(ewmvar)

    # Append
    varcalc.append(ewmvar)
    stdcalc.append(ewmstd)
    #print('ewmvar:',ewmvar)


#varcalc
df['var_pandas'] = var_pandas
df['varcalc'] = varcalc
df['std_pandas'] = std_pandas
df['stdcalc'] = stdcalc
df

enter image description here

Nilesh Ingle
  • 1,777
  • 11
  • 17