1

I want to calculate the running average over an infinite (or a huge) number of iterations:

import random
epochs = 1e10
for epoch in range(epochs):
    new_value = random.randint(1, 100)
    new_running_average = get_avg(previous_average, new_value) # what is this function going to be?

The naive approach would be to add any new value new_value to a list, and then average the list at each iteration. But, this is not realistic as 1) the number of iterations is far too large, and 2) I have to create many of such averages for many parameters.

The existing SO questions I found around calculating running average (e.g., this or this) use a fix data list that is relatively small, so it's rather trivial.

Penguin
  • 1,923
  • 3
  • 21
  • 51
  • 2
    What if you only needed a running *sum*; how would you solve that? If you have the sum and the count, getting the average is trivial. – Scott Hunter Aug 01 '23 at 12:44
  • @ScottHunter That is a good point, since the running average would be the running sum divided by the current count -- correct? The issue is then saving a variable that contains the running sum which might get quite large. I'm wondering if there's a potentially more space-efficient way, but this might work as a potential solution – Penguin Aug 01 '23 at 12:49
  • 1
    How much "space" do you think the sum will take up? – Scott Hunter Aug 01 '23 at 12:59
  • You don't need to store (in a list or whatever) all the random numbers. You just need to sum them and keep a count of the number of times you add to an accumulator. Divide the running total (the accumulator) by the current count and you have your moving average – DarkKnight Aug 01 '23 at 13:06
  • "The existing SO questions I found around calculating running average (e.g., this or this) use a fix data list that is relatively small, so it's rather trivial." The top and accepted answer for the first of those links clearly shows a technique that handles an arbitrary amount of input data, by `yield`ing each value as it's computed. It works equally well with a lazy iterator input, too. – Karl Knechtel Aug 01 '23 at 13:36
  • Aside from that: if the question is really "given the current running average, the next value and the number of elements seen so far, how do I compute an average that incorporates the next value?", than that is *purely a math question, not a programming question*. – Karl Knechtel Aug 01 '23 at 13:38

2 Answers2

1

You need to keep track of the amount of numbers already in the average:

def get_avg(nums_in_average, previous_average, new_value):
    return ((nums_in_average*previous_average)+new_value)/(nums_in_average + 1)
0

Assuming your source is an iterator (or iterable), you can create a generator function that will go through it and spit out the current average as it goes. No need for a table:

def runAvg(source):
    average = 0
    for count,value in enumerate(source,1):
        average = (average*(count-1) + value) / count
        yield value, average

Usage:

import random

epochs = int(1e10)
source = (random.randint(1, 100) for _ in range(epochs))
for value,average in runAvg(source):
    print(value,average)

72 72.0
96 84.0
94 87.33333333333333
38 75.0
36 67.2
86 70.33333333333333
47 67.0
91 70.0
80 71.11111111111111
16 65.6
23 61.72727272727273
25 58.666666666666664
40 57.23076923076923
...

If you don't want to write your own generator, you can use accumulate and tee from itertools:

import random

epochs = int(1e10)
source = (random.randint(1, 100) for _ in range(epochs))

from itertools import accumulate, tee

source,cum = tee(source)
for count,total in enumerate(accumulate(cum),1):
    print(next(source),total/count)

40 40.0
10 25.0
35 28.333333333333332
66 37.75
49 40.0
19 36.5
86 43.57142857142857
69 46.75
19 43.666666666666664
95 48.8
43 48.27272727272727
Alain T.
  • 40,517
  • 4
  • 31
  • 51