72

I was looking around stackoverflow Non-Trivial Lazy Evaluation, which led me to Keegan McAllister's presentation: Why learn Haskell. In slide 8, he shows the minimum function, defined as:

minimum = head . sort

and states that its complexity is O(n). I don't understand why the complexity is said to be linear if sorting by replacement is O(nlog n). The sorting referred in the post can't be linear, as it does not assume anything about the data, as it would be required by linear sorting methods, such as counting sort.

Is lazy evaluation playing a mysterious role in here? If so, what is the explanation behind it?

Community
  • 1
  • 1
leco
  • 1,989
  • 16
  • 30
  • 9
    Note that that slide rightfully states "for careful `sort` implementations". It's easy to write a `sort` for which this composition takes `O(n log n)` time or worse, but not for the reasons you think. –  Aug 21 '12 at 15:01

7 Answers7

60

In minimum = head . sort, the sort won't be done fully, because it won't be done upfront. The sort will only be done as much as needed to produce the very first element, demanded by head.

In e.g. mergesort, at first n numbers of the list will be compared pairwise, then the winners will be paired up and compared (n/2 numbers), then the new winners (n/4), etc. In all, O(n) comparisons to produce the minimal element.

mergesortBy less [] = []
mergesortBy less xs = head $ until (null.tail) pairs [[x] | x <- xs]
  where
    pairs (x:y:t) = merge x y : pairs t
    pairs xs      = xs
    merge (x:xs) (y:ys) | less y x  = y : merge (x:xs) ys
                        | otherwise = x : merge  xs (y:ys)
    merge  xs     []                = xs
    merge  []     ys                = ys

The above code can be augmented to tag each number it produces with a number of comparisons that went into its production:

mgsort xs = go $ map ((,) 0) xs  where
  go [] = []
  go xs = head $ until (null.tail) pairs [[x] | x <- xs]   where
    ....
    merge ((a,b):xs) ((c,d):ys) 
            | (d < b)   = (a+c+1,d) : merge ((a+1,b):xs) ys    -- cumulative
            | otherwise = (a+c+1,b) : merge  xs ((c+1,d):ys)   --   cost
    ....

g n = concat [[a,b] | (a,b) <- zip [1,3..n] [n,n-2..1]]   -- a little scrambler

Running it for several list lengths we see that it is indeed ~ n:

*Main> map (fst . head . mgsort . g) [10, 20, 40, 80, 160, 1600]
[9,19,39,79,159,1599]

To see whether the sorting code itself is ~ n log n, we change it so that each produced number carries along just its own cost, and the total cost is then found by summation over the whole sorted list:

    merge ((a,b):xs) ((c,d):ys) 
            | (d < b)   = (c+1,d) : merge ((a+1,b):xs) ys      -- individual
            | otherwise = (a+1,b) : merge  xs ((c+1,d):ys)     --   cost

Here are the results for lists of various lengths,

*Main> let xs = map (sum . map fst . mgsort . g) [20, 40, 80, 160, 320, 640]
[138,342,810,1866,4218,9402]

*Main> map (logBase 2) $ zipWith (/) (tail xs) xs
[1.309328,1.2439256,1.2039552,1.1766101,1.1564085]

The above shows empirical orders of growth for increasing lengths of list, n, which are rapidly diminishing as is typically exhibited by ~ n log n computations. See also this blog post. Here's a quick correlation check:

*Main> let xs = [n*log n | n<- [20, 40, 80, 160, 320, 640]] in 
                                    map (logBase 2) $ zipWith (/) (tail xs) xs
[1.3002739,1.2484156,1.211859,1.1846942,1.1637106]

edit: Lazy evaluation can metaphorically be seen as kind of producer/consumer idiom1, with independent memoizing storage as an intermediary. Any productive definition we write, defines a producer which will produce its output, bit by bit, as and when demanded by its consumer(s) - but not sooner. Whatever is produced is memoized, so that if another consumer consumes same output at different pace, it accesses same storage, filled previously.

When no more consumers remain that refer to a piece of storage, it gets garbage collected. Sometimes with optimizations compiler is able to do away with the intermediate storage completely, cutting the middle man out.

1 see also: Simple Generators v. Lazy Evaluation by Oleg Kiselyov, Simon Peyton-Jones and Amr Sabry.

Will Ness
  • 70,110
  • 9
  • 98
  • 181
21

Suppose minimum' :: (Ord a) => [a] -> (a, [a]) is a function that returns the smallest element in a list along with the list with that element removed. Clearly this can be done in O(n) time. If you then define sort as

sort :: (Ord a) => [a] -> [a]
sort xs = xmin:(sort xs')
    where
      (xmin, xs') = minimum' xs

then lazy evaluation means that in (head . sort) xs only the first element is ever computed. This element is, as you see, simply (the first element of) the pair minimum' xs, which is computed in O(n) time.

Of course, as delnan points out, the complexity depends on the implementation of sort.

gspr
  • 11,144
  • 3
  • 41
  • 74
  • You made your point, but just note that your sorting isn't O(n log n), but I got it ;) – leco Aug 21 '12 at 15:40
  • 16
    @LeonardoPassos that's the beauty of this answer. :) It is a selection sort, O(n^2), yet it produces the minimum in O(n) time. – Will Ness Aug 21 '12 at 15:54
  • 4
    This example is a bit misleading, as we want to use `(head . sort)` as a minimum function in O(n), but your sort requires such a minimum function. It is more interesting to gain a O(n) minimum from a sort that does *not* already use such a function. – Joachim Breitner Aug 30 '12 at 07:26
  • @JoachimBreitner: I agree. But I'd perhaps call it contrived instead of misleading :) – gspr Aug 30 '12 at 09:14
  • [this other answer](https://stackoverflow.com/a/12058086/849891) here avoid this problem by separating the finding of the minimum and its removal, which becomes trivial to do in O(n) when the minimal element is already known. – Will Ness Apr 07 '20 at 09:57
14

You've gotten a good number of answers that tackle the specifics of head . sort. I'll just add a couple more general statments.

With eager evaluation, the computational complexities of various algorithms compose in a simple manner. For example, the least upper bound (LUB) for f . g must be the sum of the LUBs for f and g. Thus you can treat f and g as black boxes and reason exclusively in terms of their LUBs.

With lazy evaluation, however, f . g can have a LUB better than the sum of f and g's LUBs. You can't use black-box reasoning to prove the LUB; you must analyze the implementations and their interaction.

Thus the often-cited fact that complexity of lazy evaluation is much harder to reason about than for eager evaluation. Just think about the following. Suppose you're trying to improve the asymptotic performance of a piece of code whose form is f . g. In an eager language, there's on obvious strategy you can follow to do this: pick the more complex of f and g, and improve that one first. If you succeed at that, you succeed at the f . g task.

In a lazy language, on the other hand, you can have these situations:

  • You improve the more complex of f and g, but f . g doesn't improve (or even gets worse).
  • You can improve f . g in ways that don't help (or even worsen) f or g.
Luis Casillas
  • 29,802
  • 7
  • 49
  • 102
12

The explanation depends on the implementation of sort, and for some implementations it will not be true. For instance with an insert sort that inserts at the end of the list, lazy evaluation does not help. So lets choose an implementation to look at, and for the sake of simplicity, lets use selection sort:

sort [] = []
sort (x:xs) = m : sort (delete m (x:xs)) 
  where m = foldl (\x y -> if x < y then x else y) x xs

The function clearly uses O(n^2) time to sort the list, but since head only needs the first element of the list, sort (delete x xs) is never evaluated!

HaskellElephant
  • 9,819
  • 4
  • 38
  • 67
8

It's not so mysterious. How much of a list do you need to sort to deliver the first element? You need to find the minimal element, which can easily be done in linear time. As it happens, for some implementations of sort lazy evaluation will do this for you.

augustss
  • 22,884
  • 5
  • 56
  • 93
7

One interesting way of seeing this in practice is to trace the comparison function.

import Debug.Trace
import Data.List

myCmp x y = trace (" myCmp " ++ show x ++ " " ++ show y) $ compare x y

xs = [5,8,1,3,0,54,2,5,2,98,7]

main = do
    print "Sorting entire list"
    print $ sortBy myCmp xs

    print "Head of sorted list"
    print $ head $ sortBy myCmp xs

First, notice the way in which the output of the entire list is interleaved with the trace messages. Second, notice how the trace messages are similar when merely computing the head.

I've just run this through Ghci, and its not exactly O(n): it takes 15 comparisons to find the first element, not the 10 that ought to be required. But its still less than O(n log n).

Edit: as Vitus points out below, taking 15 comparisons instead of 10 is not the same as saying its not O(n). I just meant that it takes more than the theoretical minimum.

Paul Johnson
  • 17,438
  • 3
  • 42
  • 59
  • 9
    _O(n)_ does not mean that 10 comparisons are used. Even if you used 50 comparisons, you can't say it's not _O(n)_, because _O(5n)_ = _O(n)_. You have to look at how the number of comparisons changes with the length of input instead. – Vitus Aug 21 '12 at 16:46
  • +1 This is a good answer (except for the slight misguidedness re O(n)), I don't understand why it was downvoted. – huon Aug 21 '12 at 19:39
  • 3
    Here's a graph of the comparison counts: http://imgur.com/vfEPp. Blue line is comparisons when sorting the whole list, green line - getting the head – Daniel Aug 22 '12 at 00:35
  • 1
    @DanielVelkov you should add this image as an answer here. – Will Ness Aug 22 '12 at 06:58
  • I second that. Awesome graph. Also it nicely demonstrates how close O(n . log n) is to O(n). What is the slope of the lower line, incidentally? It looks like 1, but its hard to see. – Paul Johnson Aug 22 '12 at 17:13
  • Ok, I'll make a separate answer with the graph and how to create it. I only have to figure out how to change the aspect ratio in matplotlib. – Daniel Aug 22 '12 at 18:31
  • @PaulJohnson: It's somewhere around 1.4 (49/70 pixels * 2), which is more or less the result you would get if you counted comparisons used by `sort` from `Data.List` on lists produced by `randoms`. – Vitus Aug 22 '12 at 18:48
6

Inspired by Paul Johnson's answer I plotted the growth rates for the two functions. First I modified his code to print one character per comparison:

import System.Random
import Debug.Trace
import Data.List
import System.Environment

rs n = do
    gen <- newStdGen
    let ns = randoms gen :: [Int]
    return $ take n ns

cmp1 x y = trace "*" $ compare x y
cmp2 x y = trace "#" $ compare x y

main = do
    n <- fmap (read . (!!0)) getArgs
    xs <- rs n
    print "Sorting entire list"
    print $ sortBy cmp1 xs

    print "Head of sorted list"
    print $ head $ sortBy cmp2 xs

Counting the * and # characters we can sample the comparison counts at evenly spaced points (excuse my python):

import matplotlib.pyplot as plt
import numpy as np
import envoy

res = []
x = range(10,500000,10000)
for i in x:
    r = envoy.run('./sortCount %i' % i)
    res.append((r.std_err.count('*'), r.std_err.count('#')))

plt.plot(x, map(lambda x:x[0], res), label="sort")
plt.plot(x, map(lambda x:x[1], res), label="minimum")
plt.plot(x, x*np.log2(x), label="n*log(n)")
plt.plot(x, x, label="n")
plt.legend()
plt.show()

Running the script would give us the following graph:

growth rates

The slope of the lower line is..

>>> import numpy as np
>>> np.polyfit(x, map(lambda x:x[1], res), deg=1)
array([  1.41324057, -17.7512292 ])

..1.41324057 (assuming it's a linear function)

Daniel
  • 26,899
  • 12
  • 60
  • 88
  • 1
    Your "n*log(n)" line uses logs to base "e" (2.714..), but sort functions generally increase with log base 2 (because of the binary split imposed by the comparison operator). According to the python docs, you can pass the base to "log" as a second argument, in which case I suspect your red and blue lines will co-incide. It would also be interesting to compare your "minimum" line (which I assume is actually for "head . sortBy cmp2" with a linear "(n-1)" line. (which is the theoretical minimum number of comparisons for "minimum". – Paul Johnson Aug 28 '12 at 11:40