21

How can you add and subtract numbers in an average without having to iterate through the entire list?

This can be very useful in many situations. For example to continuously calculate the average of the last X values in a stream, adding two averages together, and updating a rating based on a new user vote.

Sam Olesen
  • 944
  • 2
  • 8
  • 21
  • 1
    This is called [incremental averaging](https://math.stackexchange.com/questions/106700/incremental-averageing/1836447) and was answered on Math.SE. – Dan Dascalescu Jan 13 '20 at 04:03

2 Answers2

52

It is indeed possible to manipulate single values in an average in constant time, O(1).

The following function adds a number to an average. average is the current average, size is the current number of values in the average, and value is the number to add to the average:

double addToAverage(double average, int size, double value)
{
    return (size * average + value) / (size + 1);
}

Likewise, the following function removes a number from the average:

double subtractFromAverage(double average, int size, double value)
{
    // if (size == 1) return 0;       // wrong but then adding a value "works"
    // if (size == 1) return NAN;     // mathematically proper
    // assert(size > 1);              // debug-mode check
    // if(size < 2) throw(...)        // always check
    return (size * average - value) / (size - 1);
}

You might consider returning 0 as the average of a set of size 0 just so adding a value back in will give that value as the average. But if you want to consider it a bug to ever reduce your set to size 0, returning NAN will propagate that to future uses, making it more visible. But see What is the arithmetic mean of an empty sequence? - you might want to just noisily report the error on the spot, or throw a C++ exception (not just raise an FP exception) if it's a bug for this to ever happen.

If you don't special case it, you'll probably get + or -Inf, from a x / 0. with non-zero x, unless the value you remove is exactly equal to the current average; then you'll get 0. / 0. => NaN.


You can also combine these functions to easily replace a number. This is very convenient if you are calculating the average of the last X numbers in an array/stream.

double replaceInAverage(double average, int size, double oldValue, double newValue)
{
    return (size * average - oldvalue + newValue) / size;
}

It is also possible to calculate the total average of two averages in constant time:

double addAveragesTogether(double averageA, int sizeA, double averageB, int sizeB)
{
    return (sizeA * averageA + sizeB * averageB) / (sizeA + sizeB);
}
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Sam Olesen
  • 944
  • 2
  • 8
  • 21
  • While `addToAverage` is correct, note that precision errors are likely to be smaller when using this [alternative formula](https://math.stackexchange.com/a/106720/127179). – Dan Dascalescu Jan 13 '20 at 04:18
  • `subtractFromAverage ` would throw an error if `size` is `1`. I would add `if (oldSize == 1) return 0;` – Yousif Oct 07 '20 at 12:46
  • @Yousif: I'm not sure silently returning `0` is better for all use-cases. If anything, NaN would be more appropriate. (The current code will actually return `+-Inf` which is not good either, unless `average == value` to get `0. / 0.` => NaN). I guess the advantage to returning `0` is that adding to the average will set the average to that. – Peter Cordes Oct 07 '20 at 23:12
  • 1
    Also note that FP division is pretty expensive; this is still generally worth it but not as cheap as just adding and multiplying. (If `size` is a compile-time constant, you could do `double inverse = 1. / size;` but that might not be exact and could accumulate error over repeated use.) – Peter Cordes Oct 07 '20 at 23:40
23

The typical way already mentioned is:

( n * a + v ) / (n + 1);

Where n is our old count, a is our old average, and v is our new value.

However, the n * a part will ultimately overflow as n gets bigger, especially if a itself is large. To avoid this use:

a + ( v - a ) / (n + 1)

As n increases we do lose some precision - naturally we are modifying a by successively smaller amounts. Batching values can mitigate the problem, but is probably overkill for most tasks.

c z
  • 7,726
  • 3
  • 46
  • 59
  • 1
    If someone is interested why the second equation works as well, you can find a nice explanation here: https://math.stackexchange.com/a/1836447/709688 – JannisW Sep 29 '19 at 11:37
  • but is there an alternative for removal and replacement as well? – Barnack Jun 08 '20 at 20:31
  • Note that floating point keeps the same *relative* accuracy at all scales, so multiplying and then dividing by similar-sized numbers doesn't lose much precision; there's only a problem if it *actually* overflows past DBL_MAX, about `1.79769e+308` which is extremely huge. The other major numerical problem is adding a small number to a big number with `n*a + v` or `a + v/n`. If `v/n` is less than 1ULP of `a`, adding it won't even flip the low bit of the mantissa of `a`. i.e. if `|v| < |a|/2^53` or so. Even if `v` is not quite that small, you can still be losing most of its precision. – Peter Cordes Oct 07 '20 at 23:22
  • @PeterCordes Yes, this compares equation 2 to recalculating the average from scratch. Equation 1 still has the same problem though - as `n*a` approaches `MAX` then `n*a + v = n*a`. Recalculating the average using a suitable datatype will always be better, but isn't always possible (or necessary), as in the OP's case. – c z Nov 08 '21 at 09:40
  • 2
    @Barnack To remove an item from the average, remove the effect of that item from the current average, i.e. `a-(v-a)/(n-1)`. (where `n` and `a` represent the number of items and average before the removal of `v`). – c z Nov 08 '21 at 09:56