3

My question arises from answers posted to How to find the missing numbers in an arbitrary list in python 3?.

Most solutions suggest using something akin to

a = [10,12,13,8]
# get set of full numbers
allNums = set( (x for x in range(min(a),max(a)+1)))
# do some kind of set operation / symetric difference 

This needs 2 iterations of a to fetch min(a) and max(a) as values from the list to build the range that entails all numbers between min(a) and max(a).

It is easy to simplify this to just one pass of a:

def minmax(data):
    """Get the min and max of an iterable in O(n) time and constant space."""
    minValue = data[0]
    maxValue = data[0]
    for d in data[1:]:
        minValue = d if d < minValue else minValue
        maxValue = d if d > maxValue else maxValue
    return (minValue,maxValue)

that retrieves both in O(n) time and constant space.

Is there some way to do this with built-ins / modules in python?

Edit: Agreed: min() and max() are both O(n) as well - but used twice (which is constant and reduced to O(n) - yep) - but it is still slower to do it twice then once.


Edit with some benchmarkings:

import timeit

# 100k random numbers to min/max upon
data = """import random
random.seed(42)
data = random.choices(range(1000000),k=100000)"""

Functool reduce approach:

t1 = timeit.timeit("""
mi,ma=minmax(data)
""",setup="""
import functools

def minmax(aa):
    return functools.reduce(lambda mm,xx : ( min(mm[0],xx),max(mm[1],xx)) , aa, ( aa[0],aa[0],))
""" + data, number = 1000 )

Simple min / max usage:

t2 = timeit.timeit("""
mi,ma=min(data),max(data)
""",setup=data, number = 1000)

One pass try with if/elif to cut on comparisons:

t3 = timeit.timeit("""
mi,ma=minmax(data) 
""",setup="""
def minmax(data):
    minValue = data[0]
    maxValue = data[0]
    for d in data[1:]:
        if d < minValue:     # changed to if / elif: in a vain attempt to make it faster
            minValue = d     # its closer to the proposed solution in the numpy-question 
        elif d > maxValue:   # linked above
            maxValue = d
    return (minValue,maxValue)
""" + data, number = 1000)

One pass try without if/elif (more comparisons needed):

t4 = timeit.timeit("""
mi,ma=minmax(data) 
""",setup="""
def minmax(data):
    minValue = data[0]
    maxValue = data[0]
    for d in data[1:]:
        minValue = d if d < minValue else minValue 
        maxValue = d if d > maxValue else maxValue 
    return (minValue,maxValue)
""" + data, number = 1000)

Which lead to:

minmanx-reduce:      148.5929143627707   
default min + max:     3.376458476185718     # ouch .. seems we just use these
minmax1passOptimized: 15.975109436292087   
minmax1pass:          20.29275910515082
Patrick Artner
  • 50,409
  • 9
  • 43
  • 69
  • 4
    O(n + n) = O(n) – taras Sep 16 '18 at 12:53
  • 3
    Just use `min(list)` and `max(list)` – bigbounty Sep 16 '18 at 12:54
  • @fafl - consider a list of 1000 elements. What work has min to do to find the smalles value? what work has max to do to find the larges value? so why would I like to check the whole list twice to get what I can get by passing trhough it once? – Patrick Artner Sep 16 '18 at 12:54
  • @taras - you are of course right. for practical purposes 2n is still slower then n, is it? – Patrick Artner Sep 16 '18 at 12:55
  • 2
    @PatrickArtner, it depends on your overall algorithm complexity and the number of times you need to find min and max. I hardly can think of the problem when this 2n cases a bottleneck. – taras Sep 16 '18 at 12:57
  • 1
    @PatrickArtner, you might find answers to [this](https://stackoverflow.com/questions/12200580/numpy-function-for-simultaneous-max-and-min) similar question interesting – taras Sep 16 '18 at 12:57
  • @taras Wow - nice find. Gonna read that. – Patrick Artner Sep 16 '18 at 13:00
  • 1
    First of all, both the solutions you propose take 2n (O(n) asymptotically). If you want an even better approach you could find both min and max in less than 2n comparisons, actually you could find that in 1.5n comparisons, see here for example: http://www.cs.nthu.edu.tw/~wkhon/algo09/lectures/lecture8.pdf .Of course asymptotically both ways are O(n) but practically 1.5n is faster than 2n comparisons. – coder Sep 16 '18 at 13:06
  • @coder ugly font used - but good lecture. I did not think in terms of comparisons needed but just in a "process whole list once vs. twice". With the numpy OP that taras dug out, it seems its more about cache hits/misses then what I saw as "slow" as I am unable to save lots of comparisons on the way. – Patrick Artner Sep 16 '18 at 13:16

1 Answers1

1

You can use functools.reduce

import functools

def minmax(aa):
    return functools.reduce(lambda mm,xx : ( min(mm[0],xx),max(mm[1],xx)) , aa, ( aa[0],aa[0],))

print(minmax([10,25,5,100,12,32])) # print (5, 100)
napuzba
  • 6,033
  • 3
  • 21
  • 32