I am trying to verify the ewm.std calculations of pandas so that I can implement a one step update for my code. Here is the complete description of the problem with code.
mrt = pd.Series(np.random.randn(1000))
N = 100
a = 2/(1+N)
bias = (2-a)/2/(1-a)
x = mrt.iloc[-2]
ma = mrt.ewm(span=N).mean().iloc[-3]
var = mrt.ewm(span=N).var().iloc[-3]
ans = mrt.ewm(span=N).std().iloc[-2]
print(np.sqrt( bias*(1-a) * (var + a * (x- ma)**2)), ans)
(1.1352524643949702, 1.1436193844674576)
I have used standard formulation. Could somebody tell me why the two values should not be same? i.e. how is pandas calculating the exponentially weighted std?
EDIT: After Julien's answer - let me give one more use case. I am plotting the ratio of the var calculated by pandas and using the formula I inferred from the Cython code of pandas ewm-covariance. This ratio should be 1. (I am guessing there is a problem with my formula, if somebody can point it out).
mrt = pd.Series(np.random.randn(1000))
N = 100
a = 2./(1+N)
bias = (2-a)/2./(1-a)
mewma = mrt.ewm(span=N).mean()
var_pandas = mrt.ewm(span=N).var()
var_calculated = bias * (1-a) * (var_pandas.shift(1) + a * (mrt-mewma.shift(1))**2)
(var_calculated/var_pandas).plot()
The plot shows the problem clearly.
EDIT 2: By trial and error, I figured out the right formula:
var_calculated = (1-a) * (var_pandas.shift(1) + bias * a * (mrt-mewma.shift(1))**2)
But I am not convinced that it should be the right one! Can somebody put light on that?