5

I am starting to learn Haskell and I've been reading this page on Haskell's wiki, which reports this qsort implementation:

 qsort :: (Ord a) => [a] -> [a]
 qsort []     = []
 qsort (x:xs) = qsort less ++ [x] ++ qsort more
     where less = filter (<x)  xs
           more = filter (>=x) xs

followed by a warning that this is not the most efficient way to do it, and linking an article which shows an incredibly verbose version of the same algorithm. Just by looking at it, I though that that was not what I was learning Haskell for, and I wanted to make a better version of the initial qsort without sacrificing its elegance. I mostly concentrated on eliminating the need to run filter twice every call, and this is what I've come up:

filter' :: (a -> Bool) -> [a] -> ([a], [a])
filter' _ [] = ([], [])
filter' f a = filterInternal a ([], [])
  where
    filterInternal [] p = p
    filterInternal (x:xs) (l, t)
      | f x       = filterInternal xs (x:l, t)
      | otherwise = filterInternal xs (l, x:t)

qsort' :: (Ord a) => [a] -> [a]
qsort' [] = []
qsort' (x:xs) = qsort' less ++ [x] ++ qsort' more
  where
    (more, less) = filter' (>x) xs

But I am not sure this is really better. I mean, it works, but how does it compare to the initial version?

duplode
  • 33,731
  • 7
  • 79
  • 150
  • 2
    Isn't the whole point of doing FP to avoid side-effects and, by extension, in-place algorithms? – Francesco Bertolaccini Jul 28 '15 at 16:32
  • 4
    The "problem" seems to be that some people believe that you can't call it "quicksort" if it isn't an in-place algorithm. For some reason. – MathematicalOrchid Jul 28 '15 at 16:32
  • 1
    [1] [The author of that post](http://stackoverflow.com/users/649287/augustss) was just having fun implementing the in-place quicksort in a way that looks a lot like C. [2] A second inefficiency of the one-liner quicksort is using `(++)` to build the intermediate lists. [3] For an efficient implementation without mutability, you can look at the [`qsort` commented out in `Data.List`](https://hackage.haskell.org/package/base-4.8.0.0/docs/src/Data-OldList.html#sort), which amusingly was originally written by Lennart Augustsson as well. – duplode Jul 28 '15 at 16:52
  • 1
    Wait, isn't he the one behind Haskell as well? Man... I should't have let that glob of code frighten me that much... – Francesco Bertolaccini Jul 28 '15 at 16:55
  • 4
    Note that your `filter'` function already exists as `partition` in the `Data.List` module. – mb14 Jul 28 '15 at 16:58
  • Yeah, I had the feeling I was re-doing something already implemented, but as I said, I am currently learning Haskell; I took this as an exercise of functional-thinking :) – Francesco Bertolaccini Jul 28 '15 at 17:01
  • 3
    For "how does it compare functionally", try [QuickCheck](http://hackage.haskell.org/package/QuickCheck). For "how does it compare in speed", try [criterion](http://hackage.haskell.org/package/criterion). – Daniel Wagner Jul 28 '15 at 17:41
  • I shall do it as soon as I have a Haskell compiler available! Thanks for the suggestions – Francesco Bertolaccini Jul 28 '15 at 17:43

1 Answers1

0

Here's a solution I came up with when pondering roughly the same problem (please point out any caveats!). I haven't thought too much about the space-complexity (shouldn't be too terrible though), just the time. The thing that really kills the naive qsort is the O(n) operation (++). So, lets use a new data structure (which will be foldable) that gives us fast concatenation.

{-# LANGUAGE DeriveFoldable #-}

import Data.Foldable
import Data.List

data Seq a = (Seq a) :++: (Seq a) | Leaf a | Empty deriving (Foldable)

Then, a modified qsort' that returns a Seq a would be

qsort' :: Ord a => [a] -> Seq a
qsort' []     = Empty
qsort' (x:xs) = qsort less :++: Leaf x :++: qsort more
    where (less, more) = partition (<x)  xs

And qsort itself is then just qsort = toList . qsort'.

Note: the fix involving partition gets you a constant factor better, but the ++ vs :++: means that qsort can now be better than O(n^2). Also, most sort implementations out there are better than this. The point of this was only to try to mirror as closely as possible the "naive" implementation.

Alec
  • 31,829
  • 7
  • 67
  • 114
  • 5
    The real problem is not even `(++)`, but the fact that `(++)` doesn't freely re-associate across recursive calls to `qsort`. This problem can be fixed without defining a new data structure with the standard "difference list" trick. The idea is that we manipulate partially-applied `(++)` calls (of type `[a] -> [a]`) rather than flat lists (of type `[a]`); thus: `qsort :: Ord a => [a] -> ([a] -> [a]); qsort [] = id; qsort (x:xs) = qsort less . (x:) . qsort more where { ... }`. (`id = ([]++)` and `(x:) = ([x]++)`.) This reassociates all the `(++)` operations to the right, the efficient direction. – Daniel Wagner Jul 28 '15 at 20:02
  • 1
    Equivalently to @DanielWagner's suggestion, use [`Data.DList`](https://hackage.haskell.org/package/dlist-0.3/docs/Data-DList.html) to build the intermediate lists, which provides a nice abstract interface so you don't even have to see any details of the trick. – luqui Jul 28 '15 at 20:34
  • 1
    Actually the time taken by `(++)` has no effect on the asymptotic complexity of even the original `qsort`. At each level the amount of time it takes to recombine the pieces with `(++)` is no more than it already took to split apart the pieces with `partition`. – Reid Barton Jul 28 '15 at 21:12
  • 1
    [here's another version](https://stackoverflow.com/questions/7641936/is-it-possible-to-do-quicksort-of-a-list-with-only-one-passing/9550430#9550430) of it that's supposed to be faster. – Will Ness Jul 29 '15 at 10:53
  • @ReidBarton yes, it adds O(n^2) ops where it's already O(n^2); but it could've been O(n), so it's bound to affect constant factors. – Will Ness Jul 29 '15 at 12:14
  • 1
    Of course it makes a constant factor difference, but the last paragraph of this answer claims it makes an asymptotic difference. And it's a fairly popular misconception. – Reid Barton Jul 29 '15 at 14:52