1

I have the following type and two relevant functions that I intend to measure as a part of a large list fold:

Type and access functions:

data Aggregate a = Aggregate (Maybe a) (a -> Aggregate a)

get :: Aggregate a -> Maybe a
get (Aggregate get' _) = get'

put :: Aggregate a -> a -> Aggregate a
put (Aggregate _ put') = put'

First function:

updateFirst :: Maybe a -> a -> Aggregate a
updateFirst cur val = Aggregate new (updateFirst new)
  where new = mplus cur (Just val)

first :: Aggregate a
first = Aggregate Nothing (updateFirst Nothing)

Second function:

updateMinimum :: Ord a => Maybe a -> a -> Aggregate a
updateMinimum cur val = Aggregate new (updateMinimum new)
  where new = min <$> (mplus cur (Just val)) <*> Just val

minimum :: Ord a => Aggregate a
minimum = Aggregate Nothing (updateMinimum Nothing)

The functions are written in a way that the memory should be constant. Therefore, I have opted in to use the Strict language extension in GHC, which forces the thunks to be evaluated. The Weigh library was used to perform the allocation measurements:

test :: A.Aggregate Double -> Int -> Maybe Double
test agg len = A.get $ F.foldl' A.put agg (take len $ iterate (+0.3) 2.0)

testGroup :: String -> A.Aggregate Double -> Weigh ()
testGroup name agg = sequence_ $ map (\cnt -> func (str cnt) (test agg) cnt) counts
  where
    counts  = map (10^) [0 .. 6]
    str cnt = name ++ (show cnt)

main :: IO ()
main =
  mainWith
    (do setColumns [Case, Allocated, Max, GCs]
        testGroup "fst" A.first
        testGroup "min" A.minimum
    )

The Weigh output is as follows:

Case          Allocated          Max  GCs
fst1                304           96    0
fst10             2,248           96    0
fst100           21,688           96    0
fst1000         216,088           96    0
fst10000      2,160,088           96    2
fst100000    21,600,088           96   20
fst1000000  216,000,088           96  207
min1                552           96    0
min10             4,728           96    0
min100           46,488           96    0
min1000         464,088           96    0
min10000      4,967,768           96    4
min100000    49,709,656    6,537,712   44
min1000000  497,226,840  103,345,512  445

Why does GHC suddenly allocate way more memory in the inputs of size 10^5 and 10^6? My GHC version is 8.4.4.

Daniel Lovasko
  • 471
  • 2
  • 10
  • The suspect here is `new = ...` which is not evaluated until needed because of laziness. Try forcing that before returning the result, as in ``updateFirst cur val = new `seq` Aggregate new (updateFirst new)``. A bang pattern could also work, alternatively. (Maybe more forcing is needed but I'd start from this.) – chi Dec 28 '19 at 23:47
  • The module which contains the `updateFirst` is using the `Strict` extension - my understanding was that it automagically forces all thunks - is that not the case? – Daniel Lovasko Dec 28 '19 at 23:49
  • It could be the case, if that's also the module which defines `Aggregate`. Still, I'd try to test it. – chi Dec 29 '19 at 00:18
  • 2
    @DanielLovasko "my understanding was that it automagically forces all thunks " The `Strict` extension is humbler than that. It's equivalent to making the fields of all datatypes defined within the module strict. And a field being strict simply means that, whenever the value of the datatype is evaluated to WHNF—say, as the accumulator of a `foldl'`—the field will also be evaluated to WHNF. Notice that, in particular, `Maybe` doesn't become strict, as it is defined elsewhere. Perhaps that's part of the problem. Try forcing the value inside the `Just`, not only the outer `Maybe`. – danidiaz Dec 29 '19 at 00:51
  • 1
    Are there any tools apart from a pencil and a paper that would help me find which expression is causing the thunk creation? Or rather, is there a way for GHC to print all the thunks that it creates? As for `Strict`, I think that you are describing `StrictData` which makes the types strict, but the pure `Strict` extension should reach function arguments as well. – Daniel Lovasko Dec 29 '19 at 02:06
  • @danidiaz - you were right - forcing the value within the `Just` constructor with `BangPatterns` was indeed the key to fixing the memory leak. Thanks! Would you like to post your comment as an answer so that I can accept it please? – Daniel Lovasko Dec 29 '19 at 02:57

1 Answers1

3

Strictness annotations in Haskell are "relational" so to speak. They say that something must be evaluated to WHNF (weak head normal form) whenever something else is evaluated to WHNF.

For function arguments, it means that a function argument will be evaluated to WHNF before the function application itself is evaluated to WHNF.

For strict fields, it means the field will be evaluated to WHNF whenever the containing value is evaluated to WHNF. This is useful for maintaining the "chain of strictness" in datatypes that serve as accumulators (for example a datatype working as the accumulator of a foldl'). Otherwise, thunks can hide behind lazy fields even as the containing value remains in WHNF, and cause a space leak. In particular, tuples don't have strict components, and are a common source of space leaks in accumulators.

Maybe also isn't strict on the value contained in the Just constructor. In fact, this was the cause of the problem. The value inside the Just was never forced during the course of the foldl', and thunks accumulated there.

Why didn't Strict prevent the problem? Because it only affects function and datatype definitions in the current module, and Maybe was defined elsewhere. The solution is to manually force the value inside the Just at each iteration, or perhaps define your own "strict Maybe".

danidiaz
  • 26,936
  • 4
  • 45
  • 95
  • Related: https://stackoverflow.com/questions/23280936/how-do-experienced-haskell-developers-approach-laziness-at-design-time – danidiaz Dec 29 '19 at 08:26