6

I am learning haskell and the function definition I see is:

quickSort (x : xs) = (quickSort less) ++ (x : equal) ++ (quickSort more)
                 where less = filter (< x) xs
                       equal = filter (== x) xs
                       more = filter (> x) xs

Is it possible to write it with only one traversal of the list, instead of 3?

Will Ness
  • 70,110
  • 9
  • 98
  • 181
Salil
  • 9,534
  • 9
  • 42
  • 56
  • Quicksort has a `O(n lg n)` complexity on average... – Etienne de Martel Oct 03 '11 at 23:56
  • 2
    The complexity is about the number of comparisons made and the above version will have 3 times more comparisons than the one that partitions the list by traversing through it once. – Salil Oct 04 '11 at 02:43
  • why does it matter how many times it traverses the list? the complexity is the same – newacct Oct 04 '11 at 03:57
  • 1
    @newacct, it is not merely traversing the list; it is comparing every element while traversing; that is why. – Salil Oct 04 '11 at 04:03
  • @Salil: right, but `O(3n log n) == O(n log n)` so the runtime complexity is the same (the actual _runtime_ might **not** be the same, but that's different from the complexity). – ivanm Oct 04 '11 at 05:39
  • 2
    @ivanm, Yes the complexity is the same, and it is O(n log n) on average. But it is O(n^2), worst case. However the question is legitimate regardless of this fact, and implementations of algorithms are usually measured by the size of the constant factor. – HaskellElephant Oct 04 '11 at 06:34
  • 1
    @ivanm, O(n) notation is a way to measure complexity. It does not mean the complexity is the same. Suppose I measure distances by miles to make it easier for me. It does not mean it takes me the same time to travel to the next door and to a mall nearby. – Salil Oct 04 '11 at 06:55

3 Answers3

13

You mean something like this?

quicksort [] = []
quicksort (x:xs) = quicksort less ++ (x : equal) ++ quicksort more
  where (less, equal, more) = partition3 x xs

partition3 _ [] = ([], [], [])
partition3 x (y:ys) =
  case compare y x of
    LT -> (y:less, equal, more)
    EQ -> (less, y:equal, more)
    GT -> (less, equal, y:more)
  where (less, equal, more) = partition3 x ys

Note that this isn't really quicksort, as the real quicksort is in-place.

hammar
  • 138,522
  • 17
  • 304
  • 385
  • That was neat. The original algorithm I referred to also does not do in-place quicksort. I am assuming that in-place means mutable version where the list is modified in-place. – Salil Oct 04 '11 at 02:34
  • isn't this the "Wadler pair" problem that's supposed to leak space? (I've posted another version that's supposed to not have this problem, done according to his technique IIRC). – Will Ness Mar 03 '12 at 22:31
  • @WillNess: Your version is certainly more efficient, especially since it's using difference lists instead of `(++)` for appending. Mine might allocate a bit more since it's using tuples, but I don't think it should leak space. I'm not familiar with the "Wadler pair" problem, though. Do you have a link? Google didn't seem to know about it either. – hammar Mar 03 '12 at 23:04
  • google "Wadler pair space leak" basically it's about `let (a,b)=span ...` kind of thing, being translated into `let p=span ...; a=fst$p; b=snd$p` which leads to `p` being kept for `b` even when `a` is consumed already. *Pairs*... :) (I mean, *tuples* ... IIRC this is exactly what this "leak" stuff was meant to be about). – Will Ness Mar 04 '12 at 00:09
  • @WillNess: Ah yes, I see what you mean. I'm not sure I'd call it a space leak, though, since it only causes the tails of the three lists to be kept, which it has to do anyway, in addition to the tuples themselves. If I was doing something else with those lists rather than just consing to the front of them, that could definitely be a leak, though. – hammar Mar 04 '12 at 00:19
  • I think it's supposed to cause the `lesser` to still be kept even after it's consumed and sorted, as part of a triple holding `bigger` also. Seems like a leak. :) Or maybe GHC with `-O2` is smarter than that. :) – Will Ness Mar 04 '12 at 00:25
  • @WillNess: True. Perhaps you could add some text to your answer summarizing the problems with my solution and how yours improves on that? I think that would be useful to future readers. – hammar Mar 04 '12 at 00:31
  • 1
    btw I think your version is true idiomatic Haskell solution; too bad a compiler can't do that transformation automatically that I had to code by hand. We naturally express parallel "assignment" with *tuples* of vars; it's Haskell's fault for not having an explicit concept and forcing us to use a *kludge*, and then punishing us for it... – Will Ness Mar 05 '12 at 00:10
  • @Salil "in-place" usually refers to array entries' values being modified, as opposed to making array's copy with better-ordered elements. For lists, when "cons" cells are explicitly "re-linked" (e.g. in Common LISP etc.) keeping the cells contents intact, it can also be seen as "in-place" as no new storage is created in the process. – Will Ness May 15 '12 at 08:56
9

It does not seem to improve anything but:

qs (x:xs) = let (a,b) = partition (< x) xs in (qs a) ++ [x] ++ (qs b)
qq191
  • 91
  • 1
  • That was great. Why do you think it is not improving anything? It is traversing the list only once. So, the comparisons are much lesser. – Salil Oct 04 '11 at 01:57
  • @qq191: this definition assumes there are no repeated values, which is why the original has _three_ filters! – ivanm Oct 04 '11 at 05:37
  • @JonasDuregård: wait, you're right; they go into the `b` partition. – ivanm Oct 04 '11 at 10:23
8

Although late, here's a version that's supposed to not leak space as much (and seems to run about twice faster than the other 3-way version here):

qsort3 xs = go xs [] 
  where
    go     (x:xs) zs       = part x xs zs [] [] []
    go     []     zs       = zs
    part x []     zs a b c = go a ((x : b) ++ go c zs)
    part x (y:ys) zs a b c =
        case compare y x of
                  LT -> part x ys zs (y:a) b c
                  EQ -> part x ys zs a (y:b) c
                  GT -> part x ys zs a b (y:c)

This addresses the possible problem with using tuples, where let (a,b) = ... is actually translated into let t= ...; a=fst t; b=snd t which leads to the situation where even after a has been consumed and processed, it is still kept around alive, as part of the tuple t, for b to be read from it - though of course completely unnecessary. This is known as "Wadler pair space leak" problem. Or maybe GHC (with -O2) is smarter than that. :)

Also this apparently uses difference lists approach (thanks, hammar) which also makes it a bit more efficient (about twice faster than the version using tuples). I think part uses accumulator parameters, as it builds them in reversed order.

Will Ness
  • 70,110
  • 9
  • 98
  • 181
  • cf. a *stable sort* variant at the bottom of [this answer](https://stackoverflow.com/a/11373542/849891). – Will Ness Oct 09 '18 at 07:55
  • 1
    Thanks for this solution Will. It helped me with an assignment in an algorithms course which allows submission in Haskell but has a lot of assignments that aren't very Haskell-friendly. This one was meant to use a random pivot to efficiently sort lists with many equal elements, but that wasn't feasible with Haskell since the autograder doesn't allow non-base imports. This technique was still fast enough to pass the grader in less than half a second without a random pivot. – arsalanQ Nov 07 '21 at 03:08
  • @arsalanQ nice, thanks for sharing. :) – Will Ness Nov 07 '21 at 05:12