2

Basically I have a simple function call, which when used in conjunction with Criterion, results in the memory consumption exploding.

Suppose I have the following program :

{-# OPTIONS_GHC -fno-cse #-}
{-# LANGUAGE BangPatterns #-}
module Main where
import Criterion.Main
import Data.List

num :: Int
num = 10000000

lst :: a -> [Int]
lst _ = [1,2..num]

myadd :: Int -> Int -> Int
myadd !x !y = let !result = x + y in
  result

mysum = foldl' myadd 0

main :: IO ()
main = do
  print $ mysum (lst ())

Then this program (compiled with O0) runs fine, without the memory exploding.

If we use cabal build -v to yield a dump of the compilation commands invoked, and then tag -ddump-simpl -fforce-recomp -O0 -dsuppress-all (suggested in IO/Monadic assign operator causing ghci to explode for infinite list) to the end of the ghc --make -no-link ... command, we get the following core :

num
num = I# 10000000

lst
lst = \ @ a_a3Yn _ -> enumFromThenTo $fEnumInt (I# 1) (I# 2) num

myadd
myadd =
  \ x_a3Cx y_a3Cy ->
    case x_a3Cx of x1_X3CC { I# ipv_s4gX ->
    case y_a3Cy of y1_X3CE { I# ipv1_s4h0 ->
    + $fNumInt x1_X3CC y1_X3CE
    }
    }

mysum
mysum = foldl' myadd (I# 0)

main
main =
  print
    $fShowInt (mysum (enumFromThenTo $fEnumInt (I# 1) (I# 2) num))

main
main = runMainIO main

It seems that no CAFs are being produced, which is consistent with the fact that the program does not explode. Now if I run the following program which uses criterion 1.1.0.0 :

{-# OPTIONS_GHC -fno-cse #-}
{-# LANGUAGE BangPatterns #-}
module Main where
import Criterion.Main
import Data.List

num :: Int
num = 10000000

lst :: a -> [Int]
lst _ = [1,2..num]

myadd :: Int -> Int -> Int
myadd !x !y = let !result = x + y in
  result

mysum = foldl' myadd 0

main :: IO ()
main = defaultMain [
  bgroup "summation" 
    [bench "mysum" $ whnf mysum (lst ())]
  ]

then the memory consumption explodes. However printing the core yields :

num
num = I# 10000000

lst
lst = \ @ a_a3UV _ -> enumFromThenTo $fEnumInt (I# 1) (I# 2) num

myadd
myadd =
  \ x_a3Cx y_a3Cy ->
    case x_a3Cx of x1_X3CC { I# ipv_s461 ->
    case y_a3Cy of y1_X3CE { I# ipv1_s464 ->
    + $fNumInt x1_X3CC y1_X3CE
    }
    }

mysum
mysum = foldl' myadd (I# 0)

main
main =
  defaultMain
    (: (bgroup
      (unpackCString# "summation"#)
      (: (bench
            (unpackCString# "mysum"#)
            (whnf mysum (enumFromThenTo $fEnumInt (I# 1) (I# 2) num)))
         ([])))
       ([]))

main
main = runMainIO main

and it seems that no CAFs are being produced. Therefore why does the latter program, which uses criterion, result in the memory consumption exploding, whereas the former program does not? I am using GHC version 7.8.3

Community
  • 1
  • 1
artella
  • 5,068
  • 4
  • 27
  • 35

2 Answers2

3

You don't need to inspect the source of criterion to know that lst () will be shared: any subexpression will be shared (and therefore computed at most once) during the evaluation of the body of the immediately surrounding lambda. Additional lambdas may be introduced by overloading, various syntactic sugar constructs, and compiler optimizations, but none of that happens here as you can see from the Core.

If you don't want lst () to be shared then you should refactor the arguments of whnf to something like whnf (mysum . lst) ().

Reid Barton
  • 14,951
  • 3
  • 39
  • 49
  • Thanks, I had actually tried `whnf (mysum .lst) ()` before when I was using `O3` optimization, but I found that the benchmark time was independent of `num`, so I discarded this idea. However rerunning this again with `O0` shows a clear dependence on `num`. – artella Mar 30 '15 at 06:24
  • I asked the question about the behaviour with `O3` optimization at http://stackoverflow.com/questions/29340053/preventing-caching-of-computation-in-criterion-benchmark – artella Mar 30 '15 at 06:56
1

In your version without criterion, the list returned by lst () gets produced lazily and then incrementally garbage collected while mysum consumes it, since there are no other references to the list.

For the criterion version, however, look at the definition of whnf:

whnf :: (a -> b) -> a -> Benchmarkable
whnf = pureFunc id
{-# INLINE whnf #-}

and pureFunc:

pureFunc :: (b -> c) -> (a -> b) -> a -> Benchmarkable
pureFunc reduce f0 x0 = Benchmarkable $ go f0 x0
  where go f x n
          | n <= 0    = return ()
          | otherwise = evaluate (reduce (f x)) >> go f x (n-1)
{-# INLINE pureFunc #-}

It seems like x in go above would eventually become bound to the list returned by your lst (), while n is the number of iterations for the benchmarking. When the first benchmark iteration is finished, x will all have been evaluated, but this time it cannot be garbage collected: It is still kept in memory because it is shared with the following iterations through the recursive go f x (n-1).

Ørjan Johansen
  • 18,119
  • 3
  • 43
  • 53