When working with generators you can only pull out items on a single pass. An alternative is to load the generator into an list and do multiple passes but this involves a hit on performance and memory allocation.
Can anyone think of a better way of computing the following metrics from a generator in a single pass. Ideally the code computes the count, sum, average, sd, max, min and any other stats you can think of.
UPDATE
Initial horrid code in this gist. See the gist here: https://gist.github.com/3038746
Using the great suggestions from @larsmans here is the final solution I went with. Using the named tuple really helped.
import random
from math import sqrt
from collections import namedtuple
def stat(gen):
"""Returns the namedtuple Stat as below."""
Stat = namedtuple('Stat', 'total, sum, avg, sd, max, min')
it = iter(gen)
x0 = next(it)
mx = mn = s = x0
s2 = x0*x0
n = 1
for x in it:
mx = max(mx, x)
mn = min(mn, x)
s += x
s2 += x*x
n += 1
return Stat(n, s, s/n, sqrt(s2/n - s*s/n/n), mx, mn)
def random_int_list(size=100, start=0, end=1000):
return (random.randrange(start,end,1) for x in xrange(size))
if __name__ == '__main__':
r = stat(random_int_list())
print r #Stat(total=100, sum=56295, avg=562, sd=294.82537204250247, max=994, min=10)