1

In order to optimize pandas' EWM mean calculation, I'm replicating it using the numba library. However, I'm unable to figure out how the calculation is done when nan values are present.

The documentation states the following:

When ignore_na is False (default), weights are based on absolute positions. For example, the weights of x and y used in calculating the final weighted average of ... (1-alpha)**2 and alpha (if adjust is False).

If setting span to 2 with the array [1, None, 2], this would mean that the third EMA value would be calculated as:

alpha = 2 / (2 + 1)
((1 - alpha)**2) * 1 + alpha * 2

which is 1.6666. However, the actual value when executing series.ewm(span=2, adjust=False).mean()[-1] is 1.85714286.

What's the exact formula in the case of a nan value? The formula above doesn't make much sense since the weights aren't equal - it'd make more sense if both weights summed to 1.

Rok Povsic
  • 4,626
  • 5
  • 37
  • 53

1 Answers1

-1

Check the following version of panda ewm.mean() in numpy. Hope this helps.

@jit((float64[:], float64, boolean, boolean), nopython=True, nogil=True)
def _numba_ema(X, alpha, adjust, ignore_na):
    """Exponentialy weighted moving average specified by a decay ``alpha``

    Reference:
    https://stackoverflow.com/questions/42869495/numpy-version-of-exponential-weighted-moving-average-equivalent-to-pandas-ewm

    Example:
        >>> ignore_na = True     # or False
        >>> adjust = True     # or False
        >>> myema = _numba_ema_adjusted(X, alpha=alpha, ignore_na=ignore_na)
        >>> pdema = pd.Series(X).ewm(alpha=alpha, adjust=adjust, ignore_na=ignore_na).mean().values
        >>> print(np.allclose(myema, pdema, equal_nan=True))
        True

    Args:
        X (array): raw data
        alpha (float): decay factor
        adjust (boolean):
            True for assuming infinite history via the recursive form
            False for assuming finite history via the recursive form
        ignore_na (boolean): True for decaying by relative location, False for absolute location

    Returns:
        TYPE: Description
    """
    ewma = np.empty_like(X, dtype=float64)
    offset = 1
    w = 1
    for i, x in enumerate(X):
        if i == 0:
            ewma[i] = x
            ewma_old = x
        else:
            is_ewma_nan = math.isnan(ewma[i - 1])
            is_x_nan = math.isnan(x)
            if is_ewma_nan and is_x_nan:
                ewma[i] = np.nan
            elif is_ewma_nan:
                ewma[i] = x
                ewma_old = x
            elif is_x_nan:
                offset += 1
                ewma[i] = ewma[i - 1]
            else:
                if ignore_na:
                    if adjust:
                        w = w * (1 - alpha) + 1
                        ewma_old = ewma_old * (1 - alpha) + x
                        ewma[i] = ewma_old / w
                    else:
                        ewma[i] = ewma[i - 1] * (1 - alpha) + x * alpha
                else:
                    if adjust:
                        w = w * (1 - alpha) ** offset + 1
                        ewma_old = ewma_old * (1 - alpha) ** offset + x
                        ewma[i] = ewma_old / w
                    else:
                        ewma[i] = (ewma[i - 1] * (1 - alpha) ** offset + x * alpha) / ((1 - alpha) ** offset + alpha)
                    offset = 1
    return ewma
Gabriel
  • 161
  • 2
  • 11