In order to optimize pandas' EWM mean calculation, I'm replicating it using the numba library. However, I'm unable to figure out how the calculation is done when nan values are present.
The documentation states the following:
When ignore_na is False (default), weights are based on absolute positions. For example, the weights of x and y used in calculating the final weighted average of ... (1-alpha)**2 and alpha (if adjust is False).
If setting span
to 2 with the array [1, None, 2]
, this would mean that the third EMA value would be calculated as:
alpha = 2 / (2 + 1)
((1 - alpha)**2) * 1 + alpha * 2
which is 1.6666. However, the actual value when executing series.ewm(span=2, adjust=False).mean()[-1]
is 1.85714286.
What's the exact formula in the case of a nan value? The formula above doesn't make much sense since the weights aren't equal - it'd make more sense if both weights summed to 1.