Why is the performance of these two functions so drastically different despite the same Big-O?

Question

I'm learning about Big-O notation, not for a class, just learning by reading on my own. I'm reading through Pythonds and did the exercise where you're tasked with writing a "not optimal" Python function to find the minimum number in a list. The function should compare each number to every other number on the list: O(n**2).

Here's what I came up with:

def myFindmin(alist):
    return[x for x in alist if all(x<=y for y in alist)][0]

And here's what the book I'm reading gave:

def findmin(alist):
    overallmin=alist[0]
    for i in alist:
        issmallest=True
        for j in alist:
            if i>j:
                issmallest=False
        if issmallest:
            overallmin=i
    return overallmin

Obviously, the book version cuts more variables and such but, according to what I've learned, the Big-O notation fror both of these functions should be O(n**2), no? Both of them compare each number to every other number which makes n**2 the dominant part of the function, yeah?

However, when I compare my myFindmin() function to the book's findmin() function to the optimal min() function, I get three very different results:

if __name__=='__main__':
    for l in range(1000,10001,1000):
        thelist=[randrange(100000)for x in range(l)]
        print('size: %d'%l)
        for f in[findmin,myFindmin,min]:
            start=time()
            r=f(thelist)
            end=time()
            print('    function: %s \ttime: %f'%(f.__name__,end-start))

They're not even close:

...
size: 8000
    function: findmin   time: 1.427166
    function: myFindmin time: 0.004312
    function: min       time: 0.000138
size: 9000
    function: findmin   time: 1.830869
    function: myFindmin time: 0.005525
    function: min       time: 0.000133
size: 10000
    function: findmin   time: 2.280625
    function: myFindmin time: 0.004846
    function: min       time: 0.000145

As you can see, myFindmin() is not as fast as the optimal linear O(n) min() function but still WAY faster than the O(n**2) findmin() function. I would think myFindmin should be O(n**2) but it seems to be neither O(n) or O(n**2) so what the heck is happening here? What's the Big-O for myFindmin?

UPDATE

If I add brackets to the nested loop in the all statement:

def myFindmin(alist):
    return[x for x in alist if all([x<=y for y in alist])][0]

This actually makes myFindmin consistently SLOWER than findmin:

size: 8000
    function: findmin   time: 1.512061
    function: myFindmin time: 1.846030
    function: min       time: 0.000093
size: 9000
    function: findmin   time: 1.925281
    function: myFindmin time: 2.327998
    function: min       time: 0.000104
size: 10000
    function: findmin   time: 2.390210
    function: myFindmin time: 2.922537
    function: min       time: 0.000118

So what's happening here is that, in the original myFindmin, the entire list is not generated via list comprehension, it's actually generated by all() itself via generator expression which is also performing lazy evaluation, meaning it stops evaluating AND generating as soon as it finds a false value.

If I add brackets, then what happens is, the list gets generated and evaluated via list comprehension which does not perform lazy evaluation. The whole list gets generated every time before it gets passed to all() for a lazy reevaluation.

Since original myFindmin() has a Big-O of O(nlogn), the new myFindmin() (with brackets) would have a Big-O of O(n^2+nlogn) which is reflected in the resulting times. For an explanation on why the original myFindmin() is O(nlogn), see Amit's answer which I have marked as Best Answer.

Thanks all!

Implementation of list comprehension is different than implementation of for loops under the hood. list comprehensions are faster under certain circumstances. https://medium.com/quick-code/python-one-liner-list-comprehension-24e35a20bd1d — IjonTichy, Jun 25 '20 at 17:55
`all()` is only O(n) in the worst case. It exits early if it finds a falsy value. — jasonharper, Jun 25 '20 at 17:56
@IjonTichy that wouldn't explain this level of difference. And a list-comprehension is *essentially* a for-loop under the hood, with minor optimizations. The differences are marginal, not orders of magnitude — juanpa.arrivillaga, Jun 25 '20 at 18:02
@IjonTichy a lot is wrong with that article, for example, list comprehensions don't use less memory, especially if you don't need to build a list... second, they aren't "much" faster, also, "List comprehensions can contain complex expressions and nested functions" so can for-loops, indeed, for-loops can contain arbitrary, complex statements... so it is *more* flexible. — juanpa.arrivillaga, Jun 25 '20 at 18:03
@IjonTichy The original myFindmin() function didn't use list comprehension, turns out. Without nested brackets, all() both creates and evaluates the list. — Jon Red, Jun 25 '20 at 18:42
@juanpa.arrivillaga Actually it does. List comprehension creates the list when nested brackets are added. Without nested brackets, **all** is doing *both* creation *and* evaluation, no list comprehension involved. List comprehension does not perform lazy evaluation, but **all** does. That's why adding brackets changes things so dramatically. Read the update. — Jon Red, Jun 25 '20 at 18:59
@JonRed no, without brackets, it is a generator expression, which produced a generator, which is passed to `all`. `all` isn't doing anything except basically a for loop and returning early, it simply iterates over what is passed in — juanpa.arrivillaga, Jun 25 '20 at 19:02
@juanpa.arrivillaga Correct, exactly what I've been saying. A generator expression is not the same as list comprehension. Thus, list comprehension is not involved in the original myFindmin. — Jon Red, Jun 25 '20 at 19:08

amit · Accepted Answer · 2020-06-25T18:05:21.207

5

Your code is actually O(nlogn) (average case) and not O(n^2).

Take a look on all(x<=y for y in alist), and recall that to yield False, it is enough that one element is bigger than x, no need to go through all values of alist.

Assume your list is shuffled randomly (and uniformly), and let's examine how many elements needs to be traversed:

x is the highest element: traverse n elements
x is the 2nd highest element: traverse n/2 elements
x is the 3rd highest element: traverse n/3 elements
....
x is the smallest element: traverse 1 element

So, the actual complexity of your algorithm is:

T(n) = n/1 + n/2 + n/3 + ... + n/n =
     = n(1 + 1/2 + 1/3 + .... + 1/n)

Now note that 1 + 1/2 + .... + 1/n is the sum of harmonic series, and it is in O(logn)

This gives you complexity of O(nlogn) and not O(n^2) for average case complexity.

edited Jun 25 '20 at 18:05

answered Jun 25 '20 at 17:52

amit

175,853
27
231
333

Well this sounded right... but to prove it out I changed all(x<=y for y in alist) to all([x<=y for y in alist]) to make sure the whole list gets created. This brought myFindmin in line with findmin but why is all() not ditching the list when fed the list in this manner? – Jon Red Jun 25 '20 at 18:19
@JonRed I sorry, I could not understand this: `This brought myFindmin in line with findmin but why is all() not ditching the list when fed the list in this manner?`. (not the statement nor the question). My apologies, I am not native English speaker. Can you please rephrase? – amit Jun 25 '20 at 18:24
I think I see what's happening. In short, you're right. Without brackets, the comparison is done via lazy evaluation by all(). By adding brackets, the evaluation is performed on every value in the list via list comprehension, which is then simply read by all(). Which is why adding brackets brings myFindmin() to O(n^2). – Jon Red Jun 25 '20 at 18:31
1

@JonRed because it *first creates a list*, then passes that list to `all`. Whereas without brackets, it creates a lazily evaluated generator, which gets passed to `all`. `all` (or any other function) has no idea what the *expression* was that you passed in to it, just what that expression evaluates to. By the time it gets to `all` it is already whatever the expression produces. – juanpa.arrivillaga Jun 25 '20 at 18:46

Why is the performance of these two functions so drastically different despite the same Big-O?

1 Answers1