33

I have a range of dates and a measurement on each of those dates. I'd like to calculate an exponential moving average for each of the dates. Does anybody know how to do this?

I'm new to python. It doesn't appear that averages are built into the standard python library, which strikes me as a little odd. Maybe I'm not looking in the right place.

So, given the following code, how could I calculate the moving weighted average of IQ points for calendar dates?

from datetime import date
days = [date(2008,1,1), date(2008,1,2), date(2008,1,7)]
IQ = [110, 105, 90]

(there's probably a better way to structure the data, any advice would be appreciated)

Jason S
  • 184,598
  • 164
  • 608
  • 970
Jim
  • 11,229
  • 20
  • 79
  • 114

16 Answers16

23

EDIT: It seems that mov_average_expw() function from scikits.timeseries.lib.moving_funcs submodule from SciKits (add-on toolkits that complement SciPy) better suits the wording of your question.


To calculate an exponential smoothing of your data with a smoothing factor alpha (it is (1 - alpha) in Wikipedia's terms):

>>> alpha = 0.5
>>> assert 0 < alpha <= 1.0
>>> av = sum(alpha**n.days * iq 
...     for n, iq in map(lambda (day, iq), today=max(days): (today-day, iq), 
...         sorted(zip(days, IQ), key=lambda p: p[0], reverse=True)))
95.0

The above is not pretty, so let's refactor it a bit:

from collections import namedtuple
from operator    import itemgetter

def smooth(iq_data, alpha=1, today=None):
    """Perform exponential smoothing with factor `alpha`.

    Time period is a day.
    Each time period the value of `iq` drops `alpha` times.
    The most recent data is the most valuable one.
    """
    assert 0 < alpha <= 1

    if alpha == 1: # no smoothing
        return sum(map(itemgetter(1), iq_data))

    if today is None:
        today = max(map(itemgetter(0), iq_data))

    return sum(alpha**((today - date).days) * iq for date, iq in iq_data)

IQData = namedtuple("IQData", "date iq")

if __name__ == "__main__":
    from datetime import date

    days = [date(2008,1,1), date(2008,1,2), date(2008,1,7)]
    IQ = [110, 105, 90]
    iqdata = list(map(IQData, days, IQ))
    print("\n".join(map(str, iqdata)))

    print(smooth(iqdata, alpha=0.5))

Example:

$ python26 smooth.py
IQData(date=datetime.date(2008, 1, 1), iq=110)
IQData(date=datetime.date(2008, 1, 2), iq=105)
IQData(date=datetime.date(2008, 1, 7), iq=90)
95.0
jfs
  • 399,953
  • 195
  • 994
  • 1,670
  • Hi J.F. Sebastian, I'd like to use this EWMA formula to show trends on my website. I've posted a question on SO — http://stackoverflow.com/questions/9283856/. Someone suggested the EWMA algorithm for this as I need to stress more on recent items than older ones. Since I have no experience with stats, I'm a little confused as to how I calculate the value of `α`. Any help? Thank you. – Mridang Agarwalla Feb 20 '12 at 15:09
  • The linked pages are not available anymore, could you update them? – sebix Oct 06 '14 at 16:05
  • @sebix: feel free to edit. If google doesn't help then try [wayback machine](https://archive.org/web/) – jfs Oct 06 '14 at 16:07
  • what is smoothing factor ? – Kshitij Agrawal Jun 03 '20 at 08:55
  • @KshitijAgrawal: click [the "exponential smoothing" link](http://en.wikipedia.org/wiki/Exponential_smoothing) in the answer. – jfs Jun 03 '20 at 16:42
  • `sum(alpha**((today - date).days) * iq for date, iq in iq_data)` gives you the exponent weighted sum. It needs to be divided by a denominator `sum(alpha**((today - date).days) for date, iq in iq_data)` to get the average – matohak Feb 13 '22 at 22:24
16

I'm always calculating EMAs with Pandas:

Here is an example how to do it:

import pandas as pd
import numpy as np

def ema(values, period):
    values = np.array(values)
    return pd.ewma(values, span=period)[-1]

values = [9, 5, 10, 16, 5]
period = 5

print ema(values, period)

More infos about Pandas EWMA:

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.ewma.html

Into Numbers
  • 923
  • 11
  • 19
  • 4
    Don't newer versions of Pandas have new and better [functions](http://pandas.pydata.org/pandas-docs/stable/api.html#exponentially-weighted-moving-window-functions)? – Cristian Ciupitu May 11 '16 at 14:10
  • 1
    `s.ewm(span = 2/alpha-1).mean()` where `s` is a Series – user3226167 Mar 07 '19 at 06:49
  • @user3226167 how do you make alpha = y ? – luky Dec 18 '21 at 13:34
  • @luky `alpha` means [smoothing factor](https://pandas.pydata.org/docs/reference/api/pandas.Series.ewm.html). Do you mean how to create `s` from numpy array? `s = pd.Series(y)` – user3226167 Dec 20 '21 at 08:25
  • @user3226167 no i thought that "alpha" is variable X, but then i found that the equation is already incorporated in the function and one just changes the static alpha parameter – luky Dec 22 '21 at 22:34
15

I did a bit of googling and I found the following sample code (http://osdir.com/ml/python.matplotlib.general/2005-04/msg00044.html):

def ema(s, n):
    """
    returns an n period exponential moving average for
    the time series s

    s is a list ordered from oldest (index 0) to most
    recent (index -1)
    n is an integer

    returns a numeric array of the exponential
    moving average
    """
    s = array(s)
    ema = []
    j = 1

    #get n sma first and calculate the next n period ema
    sma = sum(s[:n]) / n
    multiplier = 2 / float(1 + n)
    ema.append(sma)

    #EMA(current) = ( (Price(current) - EMA(prev) ) x Multiplier) + EMA(prev)
    ema.append(( (s[n] - sma) * multiplier) + sma)

    #now calculate the rest of the values
    for i in s[n+1:]:
        tmp = ( (i - ema[j]) * multiplier) + ema[j]
        j = j + 1
        ema.append(tmp)

    return ema
Joe Holloway
  • 28,320
  • 15
  • 82
  • 92
earino
  • 2,885
  • 21
  • 20
  • 2
    Why is the function using a local variable with the same name as the function? Apart from making the code slightly less legible, it could introduce hard to detect logical bugs further down the line ... – Homunculus Reticulli Jun 04 '12 at 11:13
  • 2
    what is the point of `s = array(s)`? I had syntax errors until I just commented it out. – swdev Nov 06 '17 at 07:43
  • @chjortlund I'm not sure what you mean by "every second item in the list will be the SMA". Current EMA value is based on previous one, but you have to start somewhere so SMA is taken as initial value of set. It's correct way to calculate EMA. – Zuku Jun 18 '21 at 22:02
  • @Zuku True, I have deleted my comment. Back when I made it, I was looking for an algorithm to process live incoming data, and the above snippet is not suitable for that use case (nor is it advertised as) - my mistake! – chjortlund Jul 05 '21 at 12:30
7

You can also use the SciPy filter method because the EMA is an IIR filter. This will have the benefit of being approximately 64 times faster as measured on my system using timeit on large data sets when compared to the enumerate() approach.

import numpy as np
from scipy.signal import lfilter

x = np.random.normal(size=1234)
alpha = .1 # smoothing coefficient
zi = [x[0]] # seed the filter state with first value
# filter can process blocks of continuous data if <zi> is maintained
y, zi = lfilter([1.-alpha], [1., -alpha], x, zi=zi)
papahabla
  • 1,448
  • 18
  • 18
6

I don't know Python, but for the averaging part, do you mean an exponentially decaying low-pass filter of the form

y_new = y_old + (input - y_old)*alpha

where alpha = dt/tau, dt = the timestep of the filter, tau = the time constant of the filter? (the variable-timestep form of this is as follows, just clip dt/tau to not be more than 1.0)

y_new = y_old + (input - y_old)*dt/tau

If you want to filter something like a date, make sure you convert to a floating-point quantity like # of seconds since Jan 1 1970.

Jason S
  • 184,598
  • 164
  • 608
  • 970
5

In matplotlib.org examples (http://matplotlib.org/examples/pylab_examples/finance_work2.html) is provided one good example of Exponential Moving Average (EMA) function using numpy:

def moving_average(x, n, type):
    x = np.asarray(x)
    if type=='simple':
        weights = np.ones(n)
    else:
        weights = np.exp(np.linspace(-1., 0., n))

    weights /= weights.sum()

    a =  np.convolve(x, weights, mode='full')[:len(x)]
    a[:n] = a[n]
    return a
sebix
  • 2,943
  • 2
  • 28
  • 43
pvstrln
  • 169
  • 3
  • 10
5

My python is a little bit rusty (anyone can feel free to edit this code to make corrections, if I've messed up the syntax somehow), but here goes....

def movingAverageExponential(values, alpha, epsilon = 0):

   if not 0 < alpha < 1:
      raise ValueError("out of range, alpha='%s'" % alpha)

   if not 0 <= epsilon < alpha:
      raise ValueError("out of range, epsilon='%s'" % epsilon)

   result = [None] * len(values)

   for i in range(len(result)):
       currentWeight = 1.0

       numerator     = 0
       denominator   = 0
       for value in values[i::-1]:
           numerator     += value * currentWeight
           denominator   += currentWeight

           currentWeight *= alpha
           if currentWeight < epsilon: 
              break

       result[i] = numerator / denominator

   return result

This function moves backward, from the end of the list to the beginning, calculating the exponential moving average for each value by working backward until the weight coefficient for an element is less than the given epsilon.

At the end of the function, it reverses the values before returning the list (so that they're in the correct order for the caller).

(SIDE NOTE: if I was using a language other than python, I'd create a full-size empty array first and then fill it backwards-order, so that I wouldn't have to reverse it at the end. But I don't think you can declare a big empty array in python. And in python lists, appending is much less expensive than prepending, which is why I built the list in reverse order. Please correct me if I'm wrong.)

The 'alpha' argument is the decay factor on each iteration. For example, if you used an alpha of 0.5, then today's moving average value would be composed of the following weighted values:

today:        1.0
yesterday:    0.5
2 days ago:   0.25
3 days ago:   0.125
...etc...

Of course, if you've got a huge array of values, the values from ten or fifteen days ago won't contribute very much to today's weighted average. The 'epsilon' argument lets you set a cutoff point, below which you will cease to care about old values (since their contribution to today's value will be insignificant).

You'd invoke the function something like this:

result = movingAverageExponential(values, 0.75, 0.0001)
jfs
  • 399,953
  • 195
  • 994
  • 1,670
benjismith
  • 16,559
  • 9
  • 57
  • 80
  • How do you apply it to the non-continues data when it is available at non-uniform time intervals e.g., an in the question: today, 5 days ago, 6 days ago? – jfs Jan 28 '09 at 23:33
  • The syntax is mostly correct, except: '||' -> 'or', '&&' -> 'and', 'list.length' -> 'len(list)', parentheses near `if`, `while` are unnecessary. You can create a copy of a list in Python: `result = values[:]` or create a big "empty" one: `result = [None]*len(values)`. – jfs Jan 28 '09 at 23:43
  • Conditions could be written as follows: if not 0 <= alpha <= 1: raise ValueError("out of range, expected 0..1 get: '%s'" % alpha) – jfs Jan 28 '09 at 23:57
  • Your algorithm is quadratic when (alpha==1 or epsilon==0). M=log(epsilon)/log(alpha) could be a large factor (number of time the internal loop is executed if len(values) is large), so I would not worry about `values.reverse()` -- it is just one more pass over the data. – jfs Jan 29 '09 at 00:06
  • There are algorithms that allows to compute AWME in one pass (see `ema()` from @earino's answer and `mov_average_expw()` from mine. – jfs Jan 29 '09 at 00:07
  • I'll fix the syntax errors a little later tonight or tomorrow (gotta go now). But on the subject of linear/quadratic time, I've always implemented SMA with a linear algorithm but never EMA, since you get different results by using SMA for the 1st N elements. And what if you only have alpha but no N? – benjismith Jan 29 '09 at 00:51
  • With respect to non-continuous data at non-uniform time intervals, I don't think it's the job of this function to handle those cases. A separate wrapper class could provide interpolation services on top of this data. – benjismith Jan 29 '09 at 00:54
3

I found the above code snippet by @earino pretty useful - but I needed something that could continuously smooth a stream of values - so I refactored it to this:

def exponential_moving_average(period=1000):
    """ Exponential moving average. Smooths the values in v over ther period. Send in values - at first it'll return a simple average, but as soon as it's gahtered 'period' values, it'll start to use the Exponential Moving Averge to smooth the values.
    period: int - how many values to smooth over (default=100). """
    multiplier = 2 / float(1 + period)
    cum_temp = yield None  # We are being primed

    # Start by just returning the simple average until we have enough data.
    for i in xrange(1, period + 1):
        cum_temp += yield cum_temp / float(i)

    # Grab the timple avergae
    ema = cum_temp / period

    # and start calculating the exponentially smoothed average
    while True:
        ema = (((yield ema) - ema) * multiplier) + ema

and I use it like this:

def temp_monitor(pin):
    """ Read from the temperature monitor - and smooth the value out. The sensor is noisy, so we use exponential smoothing. """
    ema = exponential_moving_average()
    next(ema)  # Prime the generator

    while True:
        yield ema.send(val_to_temp(pin.read()))

(where pin.read() produces the next value I'd like to consume).

Rikard Anglerud
  • 4,675
  • 2
  • 16
  • 3
3

May be shortest:

#Specify decay in terms of span
#data_series should be a DataFrame

ema=data_series.ewm(span=5, adjust=False).mean()

Yusufmet
  • 135
  • 1
  • 1
  • 6
3
import pandas_ta as ta

data["EMA3"] = ta.ema(data["close"], length=3)

pandas_ta is a Technical Analysis Library: https://github.com/twopirllc/pandas-ta. Above code calculates the Exponential Moving Average (EMA) for a series. You can specify the lag value using 'length'. Spesifically, above code calculates '3-day EMA'.

Cenk
  • 325
  • 4
  • 16
  • What does it do? What's the output of your code? Where is your description? Why is it better than other solutions? – Axisnix Feb 27 '22 at 13:11
2

Here is a simple sample I worked up based on http://stockcharts.com/school/doku.php?id=chart_school:technical_indicators:moving_averages

Note that unlike in their spreadsheet, I don't calculate the SMA, and I don't wait to generate the EMA after 10 samples. This means my values differ slightly, but if you chart it, it follows exactly after 10 samples. During the first 10 samples, the EMA I calculate is appropriately smoothed.

def emaWeight(numSamples):
    return 2 / float(numSamples + 1)

def ema(close, prevEma, numSamples):
    return ((close-prevEma) * emaWeight(numSamples) ) + prevEma

samples = [
22.27, 22.19, 22.08, 22.17, 22.18, 22.13, 22.23, 22.43, 22.24, 22.29,
22.15, 22.39, 22.38, 22.61, 23.36, 24.05, 23.75, 23.83, 23.95, 23.63,
23.82, 23.87, 23.65, 23.19, 23.10, 23.33, 22.68, 23.10, 22.40, 22.17,
]
emaCap = 10
e=samples[0]
for s in range(len(samples)):
    numSamples = emaCap if s > emaCap else s
    e =  ema(samples[s], e, numSamples)
    print e
user9170
  • 950
  • 9
  • 18
2

I'm a little late to the party here, but none of the solutions given were what I was looking for. Nice little challenge using recursion and the exact formula given in investopedia. No numpy or pandas required.

prices = [{'i': 1, 'close': 24.5}, {'i': 2, 'close': 24.6}, {'i': 3, 'close': 24.8}, {'i': 4, 'close': 24.9},
          {'i': 5, 'close': 25.6}, {'i': 6, 'close': 25.0}, {'i': 7, 'close': 24.7}]


def rec_calculate_ema(n):
    k = 2 / (n + 1)
    price = prices[n]['close']
    if n == 1:
        return price
    res = (price * k) + (rec_calculate_ema(n - 1) * (1 - k))
    return res


print(rec_calculate_ema(3))
Colin E
  • 21
  • 1
0

A fast way (copy-pasted from here) is the following:

def ExpMovingAverage(values, window):
    """ Numpy implementation of EMA
    """
    weights = np.exp(np.linspace(-1., 0., window))
    weights /= weights.sum()
    a =  np.convolve(values, weights, mode='full')[:len(values)]
    a[:window] = a[window]
    return a
silgon
  • 6,890
  • 7
  • 46
  • 67
  • Faster if you replace np.convolve with from scipy import signal , a = signal.convolve(values, weights, mode='full') [:len(values)] –  Oct 10 '19 at 00:55
  • `a[:window] = a[window -1 ] ` because it is out of bounds – PaSe Feb 23 '23 at 19:48
0

I am using a list and a rate of decay as inputs. I hope this little function with just two lines may help you here, considering deep recursion is not stable in python.

def expma(aseries, ratio):
    return sum([ratio*aseries[-x-1]*((1-ratio)**x) for x in range(len(aseries))])
0

more simply, using pandas

def EMA(tw):
    for x in tw:
        data["EMA{}".format(x)] = data['close'].ewm(span=x, adjust=False).mean()
        EMA([10,50,100])
Steve
  • 1,903
  • 5
  • 22
  • 30
0

Papahaba's answer was almost what I was looking for (thanks!) but I needed to match initial conditions. Using an IIR filter with scipy.signal.lfilter is certainly the most efficient. Here's my redux:

Given a NumPy vector, x

import numpy as np
from scipy import signal

period = 12
b = np.array((1,), 'd')
a = np.array((period, 1-period), 'd')
zi = signal.lfilter_zi(b, a)
y, zi = signal.lfilter(b, a, x, zi=zi*x[0:1])

Get the N-point EMA (here, 12) returned in the vector y