169

I am trying to find a way to calculate a moving cumulative average without storing the count and total data that is received so far.

I came up with two algorithms but both need to store the count:

  • new average = ((old count * old data) + next data) / next count
  • new average = old average + (next data - old average) / next count

The problem with these methods is that the count gets bigger and bigger resulting in losing precision in the resulting average.

The first method uses the old count and next count which are obviously 1 apart. This got me thinking that perhaps there is a way to remove the count but unfortunately I haven't found it yet. It did get me a bit further though, resulting in the second method but still count is present.

Is it possible, or am I just searching for the impossible?

Kimmo Lehto
  • 5,910
  • 1
  • 23
  • 32
user1705674
  • 1,691
  • 2
  • 11
  • 4
  • 3
    NB that numerically, storing the current total and current count is the most stable way. Otherwise, for higher counts next/(next count) will start to underflow. So **if you are really worried about losing precision, keep the totals!** – AlexR Aug 11 '16 at 16:02
  • 3
    See Wikipedia https://en.wikipedia.org/wiki/Moving_average – xmedeko Sep 14 '18 at 07:02

8 Answers8

108

You can simply do:

double approxRollingAverage (double avg, double new_sample) {

    avg -= avg / N;
    avg += new_sample / N;

    return avg;
}

Where N is the number of samples where you want to average over. Note that this approximation is equivalent to an exponential moving average. See: Calculate rolling / moving average in C++

Martijn Courteaux
  • 67,591
  • 47
  • 198
  • 287
Maestro
  • 9,046
  • 15
  • 83
  • 116
  • 5
    Don't you have to add 1 to N in this before this line? avg += new_sample / N; – Damian Jun 13 '16 at 20:28
  • @Damian No, because the rolling average is apparently calculated over a fixed number of values (last `N` values). –  Aug 11 '16 at 07:51
  • 26
    This is not entirely correct. What @Muis describes is an exponentially weighted moving averge, which is sometimes appropriate but is not precisely what the OP requested. As an example, consider the behaviour you expect when most of the points are in the range 2 to 4 but one value is upwards of a million. An EWMA (here) will hold onto traces of that million for quite some time. A finite convolution, as indicated by OP, would lose it immediately after N steps. It does have the advantage of constant storage. – jma Mar 20 '17 at 08:26
  • 11
    That's not a moving average. What you describe is a one pole filter that creates exponential responses to jumps in the signal. A moving average creates a linear response with length N. – ruhig brauner Jun 20 '17 at 10:14
  • 4
    Beware that this is quite far from the common definition of average. If you set N = 5 and enter 5 `5` samples, the average will be 0.67. – Dan Dascalescu Jan 09 '18 at 08:28
  • I've implemented this algorithm and my running average always increases - it never decreases. obviously that's a concern so I'm not sure what's wrong. – fIwJlxSzApHEZIl Mar 06 '18 at 01:58
  • 3
    @DanDascalescu While you're correct that it's not actually a rolling average, your stated value is off by an order of magnitude. With `avg` initialized to `0`, you end up with `3.36` after 5 `5`s, and `4.46` after 10: http://cpp.sh/2ryql For long averages, this is certainly a useful approximation. – cincodenada Apr 13 '18 at 22:52
  • 4
    @DanDascalescu this is assuming `avg` is initialized with 0. If you initialize it to the first element instead, it behaves much better. In your example it will be `5`s all the way. – Dimagog Jan 05 '19 at 03:06
100
New average = old average * (n-1)/n + new value /n

This is assuming the count only changed by one value. In case it is changed by M values then:

new average = old average * (n-len(M))/n + (sum of values in M)/n).

This is the mathematical formula (I believe the most efficient one), believe you can do further code by yourselves

Mikhail
  • 8,692
  • 8
  • 56
  • 82
Abdullah Al-Ageel
  • 1,133
  • 1
  • 7
  • 3
  • What is sum of new value? is that different somehow from "new value" in your original formula? – Mikhail Jun 04 '16 at 21:00
  • @Mikhail in the second example, there are `m` new values being factored into the new average. I believe that `sum of new value` here is meant to be the sum of the `m` new values being used to compute the new average. – Patrick Goley Mar 01 '17 at 06:23
  • 20
    Slightly more efficient for the first one: `new_average = (old_average * (n-1) + new_value) / n` -- Removes one of the divides. – Pixelstix Sep 13 '17 at 21:40
  • How about running average of 3 elements with 6,0,0,9? – Kumar Roshan Mehta Dec 02 '17 at 00:37
  • 2
    When I implement this equation the value or running average always slowly increases. It never goes down - only up. – fIwJlxSzApHEZIl Mar 06 '18 at 01:52
  • Is `n` the total after adding the new element or before adding the new element? – David Callanan Aug 24 '21 at 10:00
  • @Pixelstix Your solution is perfectly valid but keep in mind that it is much more likely to overflow than the original solution. Depending on the likelihood of overflows over time, one should opt for one approach vs the other. – Gili Oct 05 '21 at 01:18
  • @DavidCallanan In this example, after. I agree it is slightly awkward this way. You would have to adjust the formula slightly to use an n that is the total before the new element. – Pixelstix Oct 05 '21 at 13:47
  • @Gili Yes, that would be a trade-off. Follow-up questions: Would overflow happen only one iteration sooner? And if these are doubles, would losing precision potentially become a problem well before overflow? – Pixelstix Oct 05 '21 at 13:48
43

Here's yet another answer offering commentary on how Muis, Abdullah Al-Ageel and Flip's answer are all mathematically the same thing except written differently.

Sure, we have José Manuel Ramos's analysis explaining how rounding errors affect each slightly differently, but that's implementation dependent and would change based on how each answer were applied to code.

There is however a rather big difference

It's in Muis's N, Flip's k, and Abdullah Al-Ageel's n. Abdullah Al-Ageel doesn't quite explain what n should be, but N and k differ in that N is "the number of samples where you want to average over" while k is the count of values sampled. (Although I have doubts to whether calling N the number of samples is accurate.)

And here we come to the answer below. It's essentially the same old exponential weighted moving average as the others, so if you were looking for an alternative, stop right here.

Exponential weighted moving average

Initially:

average = 0
counter = 0

For each value:

counter += 1
average = average + (value - average) / min(counter, FACTOR)

The difference is the min(counter, FACTOR) part. This is the same as saying min(Flip's k, Muis's N).

FACTOR is a constant that affects how quickly the average "catches up" to the latest trend. Smaller the number the faster. (At 1 it's no longer an average and just becomes the latest value.)

This answer requires the running counter counter. If problematic, the min(counter, FACTOR) can be replaced with just FACTOR, turning it into Muis's answer. The problem with doing this is the moving average is affected by whatever average is initiallized to. If it was initialized to 0, that zero can take a long time to work its way out of the average.

How it ends up looking

Exponential moving average

antak
  • 19,481
  • 9
  • 72
  • 80
  • 5
    Well explained. I just miss a plain average in your graph, because that what OP has asked. – xmedeko Sep 14 '18 at 06:52
  • Maybe I'm missing something, but did you, by chance, mean `max(counter, FACTOR)`. `min(counter, FACTOR)` will always return FACTOR, right? – WebWanderer Oct 03 '19 at 18:44
  • 1
    I believe the point of the `min(counter, FACTOR)` is to account for the warm up period. Without it, if your FACTOR (or N, or desired sample count) is 1000, then you'll need at least 1000 samples before you get an accurate result, since all updates before that will assume you have 1000 samples, when you may only have 20. – rharter Jan 09 '20 at 17:18
  • 1
    It would be nice to stop counting after reaching the factor, probably it would be faster that way. – inf3rno Apr 14 '20 at 05:48
  • 1
    The exponentially weighted moving average is really just a terrible Infinite Impulse Response (IIR) low-pass filter. It would likely better to just implement a proper single order Butterworth IIR. I'll need to check again, but I vaguely remember that the gain of the exponentially weighted moving average is not unity, unlike the Butterworth IIR. – Flip Oct 28 '20 at 07:51
42

From a blog on running sample variance calculations, where the mean is also calculated using Welford's method:

enter image description here

Too bad we can't upload SVG images.

Good Night Nerd Pride
  • 8,245
  • 4
  • 49
  • 65
Flip
  • 881
  • 7
  • 13
  • 4
    This is similar to what Muis implemented, except that the divide is used a common factor. Thus only one division. – Flip Jun 15 '16 at 08:41
  • 2
    It's actually closer to @Abdullah-Al-Ageel (essentially commutative math) in that Muis doesn't account for incrementing N; copy-paste formula reference: [Avg at n] = [Avg at n-1] + (x - [Avg at n-1]) / n – drzaus Aug 10 '16 at 20:38
  • 3
    @Flip & drwaus: Ain't Muis and Abdullah Al-Ageel solutions exactly the same? It's the same computation, just written differently. For me those 3 answers are identital, this one being more visual (too bad we can't use MathJax on SO). – user276648 Oct 11 '16 at 00:50
13

An example using javascript, for comparison:

https://jsfiddle.net/drzaus/Lxsa4rpz/

function calcNormalAvg(list) {
    // sum(list) / len(list)
    return list.reduce(function(a, b) { return a + b; }) / list.length;
}
function calcRunningAvg(previousAverage, currentNumber, index) {
    // [ avg' * (n-1) + x ] / n
    return ( previousAverage * (index - 1) + currentNumber ) / index;
}

(function(){
  // populate base list
var list = [];
function getSeedNumber() { return Math.random()*100; }
for(var i = 0; i < 50; i++) list.push( getSeedNumber() );

  // our calculation functions, for comparison
function calcNormalAvg(list) {
   // sum(list) / len(list)
 return list.reduce(function(a, b) { return a + b; }) / list.length;
}
function calcRunningAvg(previousAverage, currentNumber, index) {
   // [ avg' * (n-1) + x ] / n
 return ( previousAverage * (index - 1) + currentNumber ) / index;
}
  function calcMovingAvg(accumulator, new_value, alpha) {
   return (alpha * new_value) + (1.0 - alpha) * accumulator;
}

  // start our baseline
var baseAvg = calcNormalAvg(list);
var runningAvg = baseAvg, movingAvg = baseAvg;
console.log('base avg: %d', baseAvg);
  
  var okay = true;
  
  // table of output, cleaner console view
  var results = [];

  // add 10 more numbers to the list and compare calculations
for(var n = list.length, i = 0; i < 10; i++, n++) {
 var newNumber = getSeedNumber();

 runningAvg = calcRunningAvg(runningAvg, newNumber, n+1);
 movingAvg = calcMovingAvg(movingAvg, newNumber, 1/(n+1));

 list.push(newNumber);
 baseAvg = calcNormalAvg(list);

 // assert and inspect
 console.log('added [%d] to list at pos %d, running avg = %d vs. regular avg = %d (%s), vs. moving avg = %d (%s)'
  , newNumber, list.length, runningAvg, baseAvg, runningAvg == baseAvg, movingAvg, movingAvg == baseAvg
 )
results.push( {x: newNumber, n:list.length, regular: baseAvg, running: runningAvg, moving: movingAvg, eqRun: baseAvg == runningAvg, eqMov: baseAvg == movingAvg } );

if(runningAvg != baseAvg) console.warn('Fail!');
okay = okay && (runningAvg == baseAvg);    
}
  
  console.log('Everything matched for running avg? %s', okay);
  if(console.table) console.table(results);
})();
drzaus
  • 24,171
  • 16
  • 142
  • 201
13

The answer of Flip is computationally more consistent than the Muis one.

Using double number format, you could see the roundoff problem in the Muis approach:

The Muis approach

When you divide and subtract, a roundoff appears in the previous stored value, changing it.

However, the Flip approach preserves the stored value and reduces the number of divisions, hence, reducing the roundoff, and minimizing the error propagated to the stored value. Adding only will bring up roundoffs if there is something to add (when N is big, there is nothing to add)

The Flip approach

Those changes are remarkable when you make a mean of big values tend their mean to zero.

I show you the results using a spreadsheet program:

Firstly, the results obtained: Results

The A and B columns are the n and X_n values, respectively.

The C column is the Flip approach, and the D one is the Muis approach, the result stored in the mean. The E column corresponds with the medium value used in the computation.

A graph showing the mean of even values is the next one:

Graph

As you can see, there is big differences between both approachs.

  • 4
    Not really an answer, but useful info. It would be even better if you added 3rd line to your graph, for the true average over *n* past values, so we could see which of the two approaches comes the closest. – jpaugh Nov 17 '17 at 06:57
  • 4
    @jpaugh: The B column is alternating between -1.00E+15 and 1.00E+15, so when N is even, the actual mean should be 0. Graph's title is "Even partial means". This means that the 3rd line that you ask about is simply f(x)=0. The graph shows that both approaches introduce errors that keep going up and up. – desowin Feb 19 '18 at 08:31
  • That's correct, the graph shows exactly the error propagated using big numbers involved in the calculations using both approaches. – José Manuel Ramos Feb 19 '18 at 18:53
  • 2
    The legend of your graph has wrong colors: Muis's is orange, Flip's is blue. – xmedeko Sep 14 '18 at 07:15
7

A neat Python solution based on the above answers:

class RunningAverage():
    def __init__(self):
        self.average = 0
        self.n = 0
        
    def __call__(self, new_value):
        self.n += 1
        self.average = (self.average * (self.n-1) + new_value) / self.n 
        
    def __float__(self):
        return self.average
    
    def __repr__(self):
        return "average: " + str(self.average)

usage:

x = RunningAverage()
x(0)
x(2)
x(4)
print(x)
Dima Lituiev
  • 12,544
  • 10
  • 41
  • 58
2

In Java8:

LongSummaryStatistics movingAverage = new LongSummaryStatistics();
movingAverage.accept(new data);
...
average = movingAverage.getAverage();

you have also IntSummaryStatistics, DoubleSummaryStatistics ...

jmhostalet
  • 4,399
  • 4
  • 38
  • 47