can you get a count on matches in a list comprehension (trying to insertionSort on a qsort after threshold)

Question

I come from a C++ background so I'm not sure if I'm even going about this properly. But what I'm trying to do is write up quick sort but fallback to insertion sort if the length of a list is less than a certain threshold. So far I have this code:

insertionSort :: (Ord a) => [a] -> [a]
insertionSort [] = []
insertionSort (x:xs) = insert x (insertionSort xs)

quickSort :: (Ord a) => [a] -> [a]
quickSort x = qsHelper x (length x)

qsHelper :: (Ord a) => [a] -> Int -> [a]
qsHelper [] _ = []
qsHelper (x:xs) n 
    | n <= 10 = insertionSort xs
    | otherwise =  qsHelper before (length before) ++ [x] ++ qsHelper after (length after)
        where
            before = [a | a <- xs, a < x]
            after = [a | a <- xs, a >= x]

Now what I'm concerned about is calculating the length of each list every time. I don't fully understand how Haskell optimizes things or the complete effects of lazy evaluation on code like the above. But it seems like calculating the length of the list for each before and after list comprehension is not a good thing? Is there a way for you to extract the number of matches that occurred in a list comprehension while performing the list comprehension?

I.e. if we had [x | x <- [1,2,3,4,5], x > 3] (which results in [4,5]) could I get the count of [4,5] without using a call to length?

Thanks for any help/explanations!

score 6 · Accepted Answer · answered Jul 31 '12 at 08:46

Short answer: no.

Less short answer: yes, you can fake it. import Data.Monoid, then

    | otherwise =  qsHelper before lenBefore ++ [x] ++ qsHelper after lenAfter
        where
            (before, Sum lenBefore) = mconcat [([a], Sum 1) | a <- xs, a < x]
            (after, Sum lenAfter) = mconcat [([a], Sum 1) | a <- xs, a >= x]

Better answer: you don't want to.

Common reasons to avoid length include:

its running time is O(N)
- but it costs us O(N) to build the list anyway
it forces the list spine to be strict
- but we're sorting the list: we have to (at least partially) evaluate each element in order to know which is the minimum; the list spine is already forced to be strict
if you don't care how long the list is, just whether it's shorter/longer than another list or a threshold, length is wasteful: it will walk all the way to the end of the list regardless
- BINGO

isLongerThan :: Int -> [a] -> Bool
isLongerThan _ []     = False
isLongerThan 0 _      = True
isLongerThan n (_:xs) = isLongerThan (n-1) xs

quickSort :: (Ord a) => [a] -> [a]
quickSort []     = []
quickSort (x:xs) 
    | not (isLongerThan 10 (x:xs)) = insertionSort xs
    | otherwise =  quickSort before ++ [x] ++ quickSort after
        where
            before = [a | a <- xs, a < x]
            after = [a | a <- xs, a >= x]

The real inefficiency here though is in before and after. They both step through the entire list, comparing each element against x. So we are stepping through xs twice, and comparing each element against x twice. We only have to do it once.

            (before, after) = partition (< x) xs

partition is in Data.List.

Thanks for that! I like the idea of using isLongerThan. But now I wonder if quicksort in haskell would be worth it in production code. From reading up so far it seems like merge sort is more appropriate for haskell's style. Also perhaps switching to insertion sort is not something appropriate for haskell either. Couldn't find any rouch numbers out there so far, but I'll look in to this more. — aliak, Jul 31 '12 at 13:23
Oh, I thought you were doing this as a learning exercise. Why aren't you using `sort` from Data.List? — dave4420, Jul 31 '12 at 13:38
@aliak If the goal is a fast sorting algorithm and you don't need a lazily generated result, you can try to wrap `Data.Vector.Algorithms.Intro.sort`. I found that to be [faster](http://stackoverflow.com/questions/11481675/using-vectors-for-performance-improvement-in-haskell/11485331#11485331) than `Data.List.sort` in general. — Daniel Fischer, Jul 31 '12 at 14:13
@dave4420 It is a learning exercise :) But also trying to understand how practical it is. — aliak, Aug 01 '12 at 07:12

Joachim Breitner · Answer 2 · 2012-07-31T22:52:24.673

No, there is no way to use list comprehensions to simultaneously do a filter and count the number of found elements. But if you are worried about this performance hit, you should not be using the list comprehensions the way you are in the first place: You are filtering the list twice, hence applying the predicate <x and its negation to each element. A better variant would be

(before, after) = partition (< x) xs

Starting from that it is not hard to write a function

partitionAndCount :: (a -> Bool) -> [a] -> (([a],Int), ([a],Int))

that simultaneously partitions and counts the list and counts the elements in each of the returned list:

((before, lengthBefore), (after, lengthAfter)) = partitionAndCount (< x) xs

Here is a possible implementation (with a slightly reordered type):

{-# LANGUAGE BangPatterns #-}
import Control.Arrow

partitionAndCount :: (a -> Bool) -> [a] -> (([a], [a]), (Int, Int))
partitionAndCount p = go 0 0
    where go !c1 !c2 [] = (([],[]),(c1,c2))
          go !c1 !c2 (x:xs) = if p x 
                              then first (first (x:))  (go (c1 + 1) c2 xs)
                              else first (second (x:)) (go c1 (c2 + 1) xs)

And here you can see it in action:

*Main> partitionAndCount (>=4) [1,2,3,4,5,3,4,5]
(([4,5,4,5],[1,2,3,3]),(4,4))

Thanks, yeah I see that now. But wouldn't writing a partitionAndCount function require the use of length as well? Unless I use Monoids which was mentioned above but I quite frankly have no idea about right now :) — aliak, Jul 31 '12 at 13:28
No, you could use a strict accumulating parameter to get the count quite efficiently. The function would not be lazy, but that is not possible if you want the length anyways. — Joachim Breitner, Jul 31 '12 at 22:44

can you get a count on matches in a list comprehension (trying to insertionSort on a qsort after threshold)

2 Answers2