1

In teaching myself haskell, I'm trying to implement quicksort. For convenience, I'm limiting myself to packages included with ghc, which is why I'm using Array instead of Vector:

import Data.Array

type ArrayT = Array Int

quickSort :: Ord t => ArrayT t -> ArrayT t
quickSort a = rQuickSort (bounds a) a

rQuickSort :: Ord t => (Int, Int) -> ArrayT t -> ArrayT t
rQuickSort (beg,end) arr | beg >= end = arr
rQuickSort (beg,end) arr = 
  rQuickSort (beg, pivIndex) . rQuickSort (pivIndex+1, end) $ parted 
  where
    (pivoted, pivValue) = med3pivot (beg, end) arr
    (parted , pivIndex) = partition (beg+1, end-1) pivoted pivValue

med3pivot :: Ord t => (Int, Int) -> ArrayT t -> (ArrayT t, t)
med3pivot (beg, end) = 
  let mid = (beg + end) `div` 2 
    in (\arr -> (arr,arr!mid))
    . (\arr -> if arr!end < arr!mid then swap end mid arr else arr)
    . (\arr -> if arr!mid < arr!beg then swap mid beg arr else arr)
    . (\arr -> if arr!end < arr!beg then swap end beg arr else arr)
    
partition :: Ord t => (Int, Int) -> ArrayT t -> t -> (ArrayT t, Int)
partition (beg, end) arr piv =
  let fw = findForward  (piv <) (beg,end) arr
      bw = findBackward (piv >) (beg,end) arr
    in if fw >= bw then (arr, bw) 
                   else partition (fw+1, bw-1) (swap fw bw arr) piv

findForward :: (t -> Bool) -> (Int, Int) -> ArrayT t -> Int
findForward _ (b,e) _ | b > e = b
findForward p (b,e) a = if p (a!b) then b else findForward p (b+1,e) a

findBackward :: (t -> Bool) -> (Int, Int) -> ArrayT t -> Int
findBackward _ (b,e) _ | b > e = e
findBackward p (b,e) a = if p (a!e) then e else findBackward p (b,e-1) a 

swap :: Int -> Int -> ArrayT t -> ArrayT t
swap i j arr = arr // [(i, arr!j), (j, arr!i)]

mkArray :: [a] -> Array Int a
mkArray xs = listArray (0, length xs - 1) xs

I'm clearly getting this wrong. I get a running time of around 8s for sorting an array of 10000 short Bytestrings (omitting the driving code for now, because this already is a bunch). I think I don't properly understand laziness in general, and where array copies occur. Any pointers are appreciated.

(I found this answer https://stackoverflow.com/a/5269180/11061421 which uses Array.ST but I'd still like to know what's wrong with my code. Is it even possible to do an in-place sort with plain Arrays like I'm trying?)

Lokapit
  • 73
  • 6
  • Aside from what Jon Purdy said, it's really not QuickSort if you don't use randomized pivots or choose pivots using median-of-medians or similar. – dfeuer Jan 05 '21 at 22:02

1 Answers1

8

Array is an immutable data type, so you’re constructing a new array “spine” once per swap, although the elements (i.e. the values of type t) are shared.

You should generally use Array for computing things like memoisation tables, like when translating dynamic programming algorithms from other languages, not for sequential algorithms like this. An array works best when you fill in the elements with thunks that reference each other (without cycles), and read from an index to compute & cache the element for that index.

The only way to get an actual in-place sort is to use a mutable array type in ST or IO. However, I think you could also do this immutable array sorting a bit more efficiently, by constructing a series of conditional swaps (i.e. a sorting network) and only applying this to the input array once to produce the output array.

Jon Purdy
  • 53,300
  • 8
  • 96
  • 166