Minimum, Maximum and Average in a single iteration

Question

If I have a list of numbers or objects in a list like l = [3,5,3,6,47,89]. We can calculate the minimum, maximum and average using following python code

minimum = min(l)
maximum = max(l)
avg = sum(l) / len(l)

Since all involve iterating the entire list, it is slow for large lists and lot of code.Is there any python module which can calculate all these values together?

Under the hood, Python lists are arrays, with O(1) access. So it's not particularly slow to iterate over them. Just saying. — unwind, Mar 04 '14 at 15:00
possible duplicate of [numpy: function for simultaneous max() and min()](http://stackoverflow.com/questions/12200580/numpy-function-for-simultaneous-max-and-min) — YXD, Mar 04 '14 at 15:00
Any c++ programmer would puke at the idea of traversing 3 times. Which is why we have `std::minmax` now and similar goodies. — Bathsheba, Mar 04 '14 at 15:02
@unwind : How can unordered arrays have O(1) for finding minimum and maximum. — Sirish, Mar 04 '14 at 15:03
@Sirish I didn't say that, I said that *iterating over a list* isn't too slow, since each element is accessed in O(1) time. — unwind, Mar 04 '14 at 15:04
have you actually tried timing it to see how slow it is? optimizations that make sense in C, don't always make sense in python. — Corley Brigman, Mar 04 '14 at 15:05
Sorry for the trigger happy duplicate marking. This is not a numpy question. — YXD, Mar 04 '14 at 15:06
Interestingly, I've found that iterating once yourself is [slower](http://pastebin.com/Cj28nxeB) than letting the built-in functions iterate three times separately. — Kevin, Mar 04 '14 at 15:12
on the optimization analysis: traversing multiple times only really hurts you when traversing items is expensive, or you can reuse partial computations per entry. in this case, you have to do the same 2 comparisons + (up to) 3 assignments per item, and they are independent, whether you do a single traversal or multiple traversal. since lookups are O(1), traversals are not expensive, and the benefits from using built-ins (which are C primitives) greatly outweigh any possible benefits from a different algorithm implemented in pure python. — Corley Brigman, Mar 04 '14 at 15:15
@CorleyBrigman You only have to do up to 2 assignments and 1 or 2 comparisons per item; once seeded with the first element of the list, no other value can simultaneously be a new min and a new max, so in a randomized list, you should be able to get away with less than 2 (on average) comparisons per element (and 1.5 assignments as well, as you only have to accumulate on each item, while roughly 1/2 the elements of the list should be neither a new min or max). In this case though, the advantages of builtins still appear to outweigh any optimizations from folding the operations together. — Silas Ray, Mar 04 '14 at 15:40
@SilasRay - all that is true. though you probably don't save a lot of comparisons - on a truly randomly-distributed list, you'd get far less than 1/2 the elements being either a new min or max. on a sample run here of 10000 items randomly sampled from a space of about 10^7, an iterative solution only stepped the max 10 times. — Corley Brigman, Mar 04 '14 at 15:49
@CorleyBrigman Interesting, though I guess now that I think about it more, that does make sense. I guess ultimately, what it comes down to is the nature of `l` from the OP. There might be optimizations if the data can be expected to follow trends, but not if it is really randomized. — Silas Ray, Mar 04 '14 at 15:57
Using cython gave about an order of magnitude speedup, compared to using the 3 builtins. — M4rtini, Mar 04 '14 at 16:07

score 3 · Answer 1 · answered Mar 04 '14 at 15:10

If you have pandas installed, you can do something like this:

import numpy as np
import pandas
s = pandas.Series(np.random.normal(size=37))
stats = s.describe()

stats will be a another series that behaves like a dictionary:

print(stats)
count    37.000000
mean      0.072138
std       0.932000
min      -1.267888
25%      -0.688728
50%      -0.048624
75%       0.784244
max       2.501713
dtype: float64

stats['max']
2.501713

...etc. However, I don't recommend this unless you're striving simply for concise code. Here's why:

%%timeit
stats = s.describe()
# 100 loops, best of 3: 1.44 ms per loop

%%timeit
mymin = min(s)
mymax = max(s)
myavg = sum(s)/len(s)
# 10000 loops, best of 3: 89.5 µs per loop

I just can't imagine that you'll be able to squeeze any more performance out of the built-in functions with your own implementations (barring some cython voodoo, maybe).

M4rtini · Accepted Answer · 2014-03-04T16:17:22.573

Cython function:

@cython.boundscheck(False)
@cython.wraparound(False)
def minmaxAvg(list x):

    cdef int i
    cdef int _min, _max, total
    _min = x[0]
    _max = x[0]
    total = 0
    for i in x:
        if i < _min: _min = i 
        elif i > _max: _max = i 
        total += i
    return _min, _max, total/len(x)

pure python function to compare against:

def builtinfuncs(x):
    a = min(x)
    b = max(x)
    avg = sum(x) / len(x)
    return a,b,avg


In [16]: x = [random.randint(0,1000) for _ in range(10000)]

In [17]: %timeit minmaxAvg(x)
10000 loops, best of 3: 34 µs per loop

In [18]: %timeit frob(x)
1000 loops, best of 3: 460 µs per loop

Disclaimer:
- Speed result from cython will be dependent on computer hardware.
- Not as flexible and foolproof as using builtins. You would have to change the function to handle anything but integers for example.
- Before going down this path, you should ask yourself if this operation really is a big bottleneck in your application. It's probably not.

I don't think this comparison is meaningful, the two pieces of code are written in entirely different ways. You should compare against a single-iteration python function. — Apollys supports Monica, Apr 28 '17 at 18:28
@Apollys A single-iteration python function that is not using bultins is slower. — M4rtini, Apr 29 '17 at 09:27

Minimum, Maximum and Average in a single iteration

2 Answers2

Linked

Related