Why summing native lists is slower than summing church-encoded lists with `GHC -O2`?

Question

In order to test how church-encoded lists perform against user-defiend lists and native lists, I've prepared 3 benchmarks:

User-defined lists

data List a = Cons a (List a) | Nil deriving Show
lenumTil n        = go n Nil where
    go 0 result   = result
    go n result   = go (n-1) (Cons (n-1) result)
lsum Nil          = 0
lsum (Cons h t)   = h + (lsum t)

main = print (lsum (lenumTil (100000000 :: Int)))

Native lists

main = print $ sum ([0..100000000-1] :: [Int])

Church lists

fsum   = (\ a -> (a (+) 0))
fenumTil n cons nil = go n nil where
    go 0 result    = result
    go n result    = go (n-1) (cons (n-1) result)
main = print $ (fsum (fenumTil (100000000 :: Int)) :: Int)

The benchmark results are unexpected:

User-defined lists

-- 4999999950000000
-- real 0m22.520s
-- user 0m59.815s
-- sys  0m20.327s

Native Lists

-- 4999999950000000
-- real 0m0.999s
-- user 0m1.357s
-- sys  0m0.252s

Church Lists

-- 4999999950000000
-- real 0m0.010s
-- user 0m0.002s
-- sys  0m0.003s

One would expect that, with the huge amount of specific optimizations targeted to native lists, they would perform the best. Yet, church lists outperform them by a 100x factor, and outperform user-defined ADTs by a 2250x factor. I've compiled all programs with GHC -O2. I've tried replacing sum by foldl', same result. I've attempted adding user-inputs to make sure the church-list version wasn't optimized to a constant. arkeet pointed out on #haskell that, by inspecting Core, the native version has an intermediate lists, but why? Forcing allocation with an additional reverse, all 3 perform roughly the same.

Not relevant, but... : `go 0 result` should be `result (+) nil ` (right?) — chi, Aug 19 '15 at 07:40
Can you post the corresponding Core? (I.e., if it isn't absolutely gigantic.) — MathematicalOrchid, Aug 19 '15 at 08:13
For what it's worth, on my machine your native list version takes 0.76s on GHC 7.6.3 and GHC 7.8.4, but only 0.05s on GHC 7.10.1. The Church version takes about 0.06s on all three. — Daniel Wagner, Aug 19 '15 at 08:31
(Oh, and the custom list version is dog slow on all three. But considering that it's not written in the same guarded recursive style that the other two are, that's pretty understandable.) — Daniel Wagner, Aug 19 '15 at 08:34
@MathematicalOrchid I'm not well versed in Core, it is giving me something giant. What should I do? — MaiaVictor, Aug 20 '15 at 04:57

score 19 · Accepted Answer · answered Aug 19 '15 at 09:00

19

GHC 7.10 has call arity analysis, which lets us define foldl in terms of foldr and thus let left folds, including sum, participate in fusion. GHC 7.8 also defines sum with foldl but it can't fuse the lists away. Thus GHC 7.10 performs optimally and identically to the Church version.

The Church version is child's play to optimize in either GHC versions. We just have to inline (+) and 0 into fenumTil, and then we have a patently tail-recursive go which can be readily unboxed and then turned into a loop by the code generator.

The user-defined version is not tail-recursive and it works in linear space, which wrecks performance, of course.

answered Aug 19 '15 at 09:00

András Kovács

29,931
3
53
99

Omitted GHC Core because it doesn't show us anything interesting beyond what I wrote above. – András Kovács Aug 19 '15 at 09:24
1

I would like to see it :( but okay! Thanks András, I know someone would have the answer. – MaiaVictor Aug 19 '15 at 15:08
Eh, are you sure it is compiled to a loop? The executable works even for really, really huge numbers. Looks like it is compiled to the sum formula... (?) – MaiaVictor Aug 20 '15 at 02:37
(Maybe LLVM itself is compiling the loop to the formula, though?) – MaiaVictor Aug 20 '15 at 04:58
@Viclib Computers are really good at adding big numbers. They are bad at storing a big number of numbers. – PyRulez Aug 29 '15 at 16:50

Why summing native lists is slower than summing church-encoded lists with `GHC -O2`?

User-defined lists

Native lists

Church lists

User-defined lists

Native Lists

Church Lists

1 Answers1

Linked