Haskell Quick Sort implementation (how to obtain the best performance?)

Question

I am currently reading about functional data structures and algorithms and I tried different quick sort implementations. I noticed, however, that there is a lot of variation in their performances.

Here are some selected ones that I want to discuss (all the programs have been compiled (ghc program.hs) for test (the time in parentheses is the one obtained with the optimization -O flag), and the list to be sorted was the (worst case already sorted (which implies that the time taken is in O(n^2))) [1 .. 20000] list):

qs1 []             = []
qs1 (pivot : rest) = qs1 lower ++ [pivot] ++ qs1 upper  where
  lower = [ x | x <- rest, x <= pivot ]
  upper = [ x | x <- rest, x >  pivot ]

This is the classic sort we see when we learn the language. Here, the rest list is traversed twice to filter first the elements less or equal than the pivot, then the elements greater than the pivot. The time taken was 6.45 (4.42) sec.

qs2 []             = []
qs2 (pivot : rest) = qs2 lower ++ [pivot] ++ qs2 upper  where
  (lower, upper) = foldr
    (\x (l, u) -> if x <= pivot then (x : l, u) else (l, x : u))
    ([], [])
    rest

I was surprised by this one. It is the same as above, except that it only traverses the rest list once. It did worse than the first with a total of 8.11 (2.95) sec. I tried the partition function from Data.List library, but it only did worse (much worse), with a catastrophic 10.99 (9.99) sec. I checked the implementation, and yet it is almost the same as my lambda function, although it relies on an utility function:

partition               :: (a -> Bool) -> [a] -> ([a],[a])
partition p xs = foldr (select p) ([],[]) xs

select :: (a -> Bool) -> a -> ([a], [a]) -> ([a], [a])
select p x ~(ts,fs) | p x       = (x:ts,fs)
                    | otherwise = (ts, x:fs)

(Taken from https://hackage.haskell.org/package/base-4.14.0.0/docs/src/Data.OldList.html#partition).

qs3 []             s = s
qs3 (pivot : rest) s = qs3 lower (pivot : (qs3 upper s))  where
    lower = [ x | x <- rest, x <= pivot ]
    upper = [ x | x <- rest, x >  pivot ]

In this one, there are 2 novelties. Fist, the removal of append ++ in favour of cons :. Second, it is a tail recursion function, so it should (in principle) be faster. However, it is merely better than the first one, with a time of 6.42 (4.44) sec. In fact, due to variation from an execution to the other, it probably is the same.

qs4 []             s = s
qs4 (pivot : rest) s = qs4 lower (pivot : (qs4 upper s))  where
  (lower, upper) = foldr
    (\x (l, u) -> if x <= pivot then (x : l, u) else (l, x : u))
    ([], [])
    rest

Again, this is the same as above, except that I replaced the 2 list traversal by a foldr, and again, the time taken is increased: 8.02 (2.95) sec.

split pivot [] lower upper s = qs5 lower (pivot : qs5 upper s)
split pivot (h : t) lower upper s
     | h <= pivot = split pivot t (h : lower) upper s
     | otherwise  = split pivot t lower (h : upper) s

qs5 []             s = s
qs5 (pivot : rest) s = split pivot rest [] [] s

This one is the fastest. I recorded a stunning time of 2.82 (1.92) sec, which is almost 4 times faster than the “slowest”. It bounces between 2 functions that calls each other. split is a recursive function that separates the elements of the rest list sent by the qs5 functions, which it returns to when it is done partitioning.

Conclusion: What the hell is going on here? I am puzzled by all the hidden subtleties done during the compilation that makes my expectations concerning the performance of the program go wrong. I humbly thank anyone who could help me untangle the pieces of this jigsaw by pointing out what is going on under the hood.

To begin: how you are measuring those times? Ideally, one should use a benchmark library like `criterion`. Alternatively, you can compile your code with optimization on and run the code a few times. GHCi, instead, should never be used to measure performance since it skips many optimizations so to reload code faster. — chi, Jul 04 '20 at 18:29
I compiled all the programs with `ghc program.hs` and timed them with time (i.e. `time ./program`. I ran the programs several times and the difference was never significative (+/- 0.05 sec). I edited the original post to also include times with the `-O` flag, which changes drastically the time taken by each, but still, I would like to know why it goes from one to the other. — Philip Gaudreau, Jul 04 '20 at 19:49
Nice. It's not always easy to reason about performance in Haskell, because of its laziness. `qs4` and `qs5` use a known technique to avoid the slow `++`, using "difference lists" (you can search for that). On top, `qs5` avoids passing pairs in favour of multiple arguments, which could be faster (if the deforestation optimization does not kick in). You could also try `qs4` with ``\x (l, u) -> l `seq` u `seq` if .....`` to see if that changes something, but it's only a wild guess. — chi, Jul 04 '20 at 20:59
Don't use lists, or you will never get the best performance for quicksort. See my answer here: https://stackoverflow.com/questions/19752983/is-it-possible-to-speed-up-a-quicksort-with-par-in-haskell/55885118#55885118 — lehins, Jul 06 '20 at 14:26

Haskell Quick Sort implementation (how to obtain the best performance?)

0 Answers0