I was testing the performance of the partition
function for lists and got some strange results, I think.
We have that partition p xs == (filter p xs, filter (not . p) xs)
but we chose the first implementation because it only performs a single traversal over the list. Yet, the results I got say that it maybe be better to use the implementation that uses two traversals.
Here is the minimal code that shows what I'm seeing
import Criterion.Main
import System.Random
import Data.List (partition)
mypartition :: (a -> Bool) -> [a] -> ([a],[a])
mypartition p l = (filter p l, filter (not . p) l)
randList :: RandomGen g => g -> Integer -> [Integer]
randList gen 0 = []
randList gen n = x:xs
where
(x, gen') = random gen
xs = randList gen' (n - 1)
main = do
gen <- getStdGen
let arg10000000 = randList gen 10000000
defaultMain [
bgroup "filters -- split list in half " [
bench "partition100" $ nf (partition (>= 50)) arg10000000
, bench "mypartition100" $ nf (mypartition (>= 50)) arg10000000
]
]
I ran the tests both with -O
and without it and both times I get that the double traversals is better.
I am using ghc-7.10.3
with criterion-1.1.1.0
My questions are:
Is this expected?
Am I using Criterion correctly? I know that laziness can be tricky and
(filter p xs, filter (not . p) xs)
will only do two traversals if both elements of the tuple are used.Does this has to do something with the way lists are handled in Haskell?
Thanks a lot!