15

I know that quicksort has O(n log n) average time complexity. A pseudo-quicksort (which is only a quicksort when you look at it from far enough away, with a suitably high level of abstraction) that is often used to demonstrate the conciseness of functional languages is as follows (given in Haskell):

quicksort :: Ord a => [a] -> [a]
quicksort []     = []
quicksort (p:xs) = quicksort [y | y<-xs, y<p] ++ [p] ++ quicksort [y | y<-xs, y>=p]

Okay, so I know this thing has problems. The biggest problem with this is that it does not sort in place, which is normally a big advantage of quicksort. Even if that didn't matter, it would still take longer than a typical quicksort because it has to do two passes of the list when it partitions it, and it does costly append operations to splice it back together afterwards. Further, the choice of the first element as the pivot is not the best choice.

But even considering all of that, isn't the average time complexity of this quicksort the same as the standard quicksort? Namely, O(n log n)? Because the appends and the partition still have linear time complexity, even if they are inefficient.

Ord
  • 5,693
  • 5
  • 28
  • 42
  • In short: No, and use Vector + quicksort on vectors if you want `n Log n`. See my answer [here](http://stackoverflow.com/questions/10357663/python-faster-than-compiled-haskell/10360424#10360424). – Thomas M. DuBuisson Jul 06 '12 at 04:31
  • 1
    @ThomasM.DuBuisson I know there are ways to get an efficient Haskell quicksort, but my question was really just: *is this thing `O(n log n)`?*. Could you elaborate on why it isn't? The link you gave doesn't really answer that question. – Ord Jul 06 '12 at 04:37
  • Right, it was a comment not an answer because of that omission. I know there's a "Haskell 'quicksort' is not quicksort" analysis out there (PDF) but can't find a link right now. I would expect it to provide a nice, well thought out, answer if anyone can give a pointer. – Thomas M. DuBuisson Jul 06 '12 at 05:57
  • "it would still take longer than a typical quicksort because it has to do two passes of the list when it partitions it" No it wouldn't. – newacct Jul 06 '12 at 19:24

6 Answers6

11

This "quicksort" is actually deforested tree sort: http://www.reddit.com/r/programming/comments/2h0j2/real_quicksort_in_haskell

data Tree a = Leaf | Node a (Tree a) (Tree a)

mkTree [] = Leaf
mkTree (x:xs) = Node x (mkTree (filter (<= x) xs)) (mkTree (filter (x <) xs))

Binary tree is unbalanced, so O(N^2) worst-case and O(N*Log N) average-case complexity for building search tree.

foldTree f g Leaf = g
foldTree f g (Node x l r) = f x (foldTree f g l) (foldTree f g r)

treeSort l = foldTree (\x lft rht -> lft++[x]++rht) [] (mkTree l)

Retrieval algorithm have O(N^2) worst-case and O(N*Log N) average-case complexity.

Well-balanced:

Prelude> let rnds = iterate step where step x = (75*x) `mod` 65537
Prelude> length . quicksort . take 4000 . rnds $ 1
4000
(0.08 secs, 10859016 bytes)
Prelude> length . quicksort . take 8000 . rnds $ 1
8000
(0.12 secs, 21183208 bytes)
Prelude> length . quicksort . take 16000 . rnds $ 1
16000
(0.25 secs, 42322744 bytes)

Not-so-well-balanced:

Prelude> length . quicksort . map (`mod` 10) $ [1..4000]
4000
(0.62 secs, 65024528 bytes)
Prelude> length . quicksort . map (`mod` 10) $ [1..8000]
8000
(2.45 secs, 241906856 bytes)
Prelude> length . quicksort . map (`mod` 10) $ [1..16000]
16000
(9.52 secs, 941667704 bytes)
klapaucius
  • 1,094
  • 8
  • 6
  • This doesn't seem to answer the question (which is about average-case, not worst-case complexity). Also, the claim that insertion into an unbalanced binary tree is O(n^2) seems wrong to me. – Daniel Wagner Jul 06 '12 at 07:10
  • @DanielWagner, I think the claim is about building the tree, so O(n) insertions. – huon Jul 06 '12 at 07:40
  • @dbaupp It wasn't so obvious before I've edited my answer, so DanielWagner's criticism completely reasonable. – klapaucius Jul 06 '12 at 07:55
  • 1
    @DanielWagner Complexity for retrieval part (catamorphism) is average-case O(N^2). For example this implementation http://en.literateprograms.org/Quicksort_%28Haskell%29#Using_an_accumulator have O(N) complexity of retreival part of tree sort. Therefore overall complexity for "accumulator" implementation is O(N*Log N) (average) and O(N^2) (worst). For "naive" - O(N^2) average-case. – klapaucius Jul 06 '12 at 08:15
  • Why is the average case for the retrieval `O(n^2)`? If the tree is approximately balanced, you get `n log n` levels, and at each level, you have to append many groups of elements together. Since append has time complexity O(n), wouldn't the whole thing have time complexity O(n log n)? This wikipedia article (http://en.wikipedia.org/wiki/Tree_sort) says that a tree sort is average case O(n log n), and even gives a Haskell implementation (with exactly the same retrieval algorithm), but it does say that worst case for that implementation is `O(n^2)` ... – Ord Jul 06 '12 at 12:51
  • Sorry, I meant you get `log n` levels! – Ord Jul 06 '12 at 13:19
  • 1
    @Ord, you're right and I was wrong. Average-case complexity for the retrieval actually is O(N*Log N). – klapaucius Jul 07 '12 at 11:16
5

I agree with your assumption that the average time complexity still is O(n log n). I'm not an expert and 100% sure, but these are my thoughts:

This is a pseudo code of the in-place quicksort: (call quicksort with l=1 and r=length of the array)

Quicksort(l,r)  
--------------
IF r-l>=1 THEN  
    choose pivot element x of {x_l,x_l+1,...,x_r-1,x_r}   
    order the array-segment x_l,...x_r in such a way that  
        all elements < x are on the left side of x // line 6  
        all elements > x are on the right side of x // line 7  
    let m be the position of x in the 'sorted' array (as said in the two lines above)  
    Quicksort(l,m-1);  
    Quicksort(m+1,r)  
FI  

The average time complexity analysis then reasons by selecting the "<"-comparisons in line 6 and 7 as the dominant operation in this algorithm and finally comes to the conclusion that the average time complexity is O(n log n). As the cost of line "order the array-segment x_l,...x_r in such a way that..." are not considered (only the dominant operation is important in time complexity analysis if you want to find bounds), I think "because it has to do two passes of the list when it partitions it" is not a problem, also as your Haskell version would just take approximately twice as long in this step. The same holds true for the appendix-operation and I agree with on that this adds nothing to the asymptotic costs:

Because the appends and the partition still have linear time complexity, even if they are inefficient.

For the sake of convenience lets assume that this adds up "n" to our time complexity costs, so that we have "O(n log n+n)". As there exists a natural number o for that n log n > n for all natural numbers greater than o holds true, you can estimate n log n +n to the top by 2 n log n and to the bottom by n log n, therefore n log n+n = O(n log n).

Further, the choice of the first element as the pivot is not the best choice.

I think the choice of the pivot element is irrelevant here, because in the average case analysis you assume uniform distribution of the elements in the array. You can't know from which place in the array you should select it, and you therefore have to consider all these cases in which your pivot-element (independently from which place of the list you take it) is the i-st smallest element of your list, for i=1...r.

efie
  • 544
  • 5
  • 22
4

I can offer you a run time test on Ideone.com which seems to show more or less linearithmic run-times for both (++) based versions and the one using accumulator technique from the Landei's answer, as well as another one, using one-pass three-way partitioning. On ordered data this turns quadratic or worse for all of them.

-- random:   100k        200k         400k         800k
-- _O    0.35s-11MB  0.85s-29MB   1.80s-53MB   3.71s-87MB   n^1.3  1.1  1.0
-- _P    0.36s-12MB  0.80s-20MB   1.66s-45MB   3.76s-67MB   n^1.2  1.1  1.2
-- _A    0.31s-14MB  0.62s-20MB   1.58s-54MB   3.22s-95MB   n^1.0  1.3  1.0
-- _3    0.20s- 9MB  0.41s-14MB   0.88s-24MB   1.92s-49MB   n^1.0  1.1  1.1

-- ordered:   230     460     900     1800
-- _P        0.09s   0.33s   1.43s    6.89s                 n^1.9  2.1  2.3
-- _A        0.09s   0.33s   1.44s    6.90s                 n^1.9  2.1  2.3
-- _3        0.05s   0.15s   0.63s    3.14s                 n^1.6  2.1  2.3

quicksortO xs = go xs  where
  go []  =  []
  go (x:xs) = go [y | y<-xs, y<x] ++ [x] ++ go [y | y<-xs, y>=x]

quicksortP xs = go xs  where
  go []  =  []
  go (x:xs) = go [y | y<-xs, y<x] ++ (x : go [y | y<-xs, y>=x])

quicksortA xs = go xs [] where
  go [] acc = acc
  go (x:xs) acc = go [y | y<-xs, y<x] (x : go [y | y<-xs, y>=x] acc)

quicksort3 xs = go xs [] where
  go     (x:xs) zs = part x xs zs [] [] []
  go     []     zs = zs
  part x []     zs a b c = go a ((x : b) ++ go c zs)
  part x (y:ys) zs a b c =
      case compare y x of
                  LT -> part x ys zs (y:a) b c
                  EQ -> part x ys zs a (y:b) c
                  GT -> part x ys zs a b (y:c)

The empirical run-time complexities are estimated here as O(n^a) where a = log( t2/t1 ) / log( n2/n1 ). The timings are very approximate as ideone aren't very reliable with occasional far outlyers, but for checking the time complexity it's enough.

Thus these data seem to indicate that one-pass partition is faster by 1.5x-2x than two-pass schemes, and that using (++) is in no way slowing things down - at all. I.e. the "append operations" are not "costly" at all. The quadratic behaviour or (++)/append seems to be an urban myth — in Haskell context of course (edit: ... i.e. in the context of guarded recursion/tail recursion modulo cons; cf. this answer) (update: as user:AndrewC explains, it really is quadratic with the left folding; linear when (++) is used with the right folding; more about this here and here).


later addition: To be stable, the three-way partitioning quicksort version should too build its parts in the top-down manner:

q3s xs = go xs [] where
  go     (x:xs) z = part x xs  go  (x:)  (`go` z)
  go     []     z = z
  part x []      a  b  c = a [] (b (c []))
  part x (y:ys)  a  b  c =
      case compare y x of
                  LT -> part x ys  (a . (y:))  b  c
                  EQ -> part x ys  a  (b . (y:))  c
                  GT -> part x ys  a  b  (c . (y:))

(performance not tested).

Will Ness
  • 70,110
  • 9
  • 98
  • 181
  • 2
    +1 for excellent decision to use actual performance data. You can get quadratic behaviour by using `(++)` repeatedly with `foldl` or the manual `foldl` of a badly-written recursive String-generating and appending function. I can vouch for this being a non-myth, because it was in my first major real-world Haskell program, back before Haskell 98, and it didn't output anything for ages until I realised my mistake and corrected it. If you use it sensibly, of course, with foldr, it's linear. – AndrewC Feb 09 '13 at 21:20
3

I don't know how much this improves the runtime complexity, but by using an accumulator you can avoid the expensive (++):

quicksort xs = go xs [] where
  go [] acc = acc
  go (x:xs) acc = go [y | y<-xs, y<x] (x : go [y | y<-xs, y>=x] acc)
Landei
  • 54,104
  • 13
  • 100
  • 195
0

Look here for a true O(n log n) quicksort that will work on both arrays and lists : http://citeseer.ist.psu.edu/viewdoc/download?doi=10.1.1.23.4398&rep=rep1&type=pdf It is quite easy to implement in Common Lisp, and it outperforms the sort implementation of many commercial lisps.

FabPop
  • 181
  • 5
  • 1
    Interesting, but I'm not sure how this answers the actual question in any way--the question was about a *specific algorithm* that superficially resembles quicksort, not about implementing a true quicksort. – C. A. McCann Sep 04 '12 at 15:31
0

Yes, this version has the same asymptotic complexity as the classic version -- you replace the linear-time partition with: two passes (< and >=), and you have the additional linear-time ++ (which includes linear re-allocing/copying). So it's a hefty constant-factor worse than an in-place partition, but it's still linear. All the other aspects of the algorithm are the same, so the same analysis that gives O(n log n) average-case for "true" (i.e. in-place) quicksort still holds here.

not-just-yeti
  • 17,673
  • 1
  • 18
  • 15
  • `(++)` is not linear in time complexity due to lazy evaluation. Also `<` and `>=` can be easily replaced by `partition` from `Data.List` which makes only one pass. – is7s Sep 04 '12 at 15:34