IO/Monadic assign operator causing ghci to explode for infinite list

Question

Consider the following program. It runs forever and does nothing useful, but the memory consumption in ghci is constant :

--NoExplode.hs
module Main (main) where

test :: [Int] -> IO()
test lst = do
  print "test"
  rList lst

rList :: [Int] -> IO ()
rList [] = return ()
rList (x:xs) = do
  rList xs

main = do
  test [1..]

Now consider the following trivially modified version of the above. When this program is run in ghci the memory explodes. The only difference is that print "test" is now assigned to x in the do block of test.

--Explode.hs
module Main (main) where

test :: [Int] -> IO()
test lst = do
  x <- print "test"
  rList lst

rList :: [Int] -> IO ()
rList [] = return ()
rList (x:xs) = do
  rList xs

main = do
  test [1..]

Why does changing print "test" to x <- print "test" cause ghci to blow up?

p.s. I came across this when trying to understand Memory exploding upon writing a lazy bytestring to file in ghci, and the problem there (I think) essentially distils to the above. Thanks

Certainly seems like a bug in GHCi to me. Interestingly, this manifests in GHCi even if you compile the module with optimizations and then load the compiled code into GHCi. It works fine if you just compile it and run it without GHCi, even without optimizations. So it would seem that something in GHCi is maintaining a reference to the head of the list, which isn't even visible outside the module. — lpsmith, Jul 28 '14 at 04:11
For those who actually want to run the code in GHCi, stay safe and run GHCi with limited heap (`ghci +RTS -M100m --RTS …`). — Zeta, Jul 28 '14 at 11:52
Given that ghc floats `[1..]` to a top-level constant (`lst_rq4` in the Core shown in Zeta's answer), the difference in behavior between compiling the program and running it in ghci is not surprising. After all `main` has a reference to `lst_rq4` and `main` is exported, and indeed you could type `main` into ghci, then hit Ctrl-C after a little while and type `main` again, and it would reuse the computation of `lst_rq4`. — Reid Barton, Jul 28 '14 at 17:31

Zeta · Accepted Answer · 2014-07-28T07:54:27.493

Disclaimer: I'm not a GHCi expert, and also not that good with GHC core. Now that I've lost my credibility, lets try to understand what happens:

GHCi and CAFs

GHCi retains all evaluated CAFs:

Normally, any evaluation of top-level expressions (otherwise known as CAFs or Constant Applicative Forms) in loaded modules is retained between evaluations.

Now you might wonder why there's such a big difference between both versions. Lets have a look at the core with -ddump-simpl. Note that you might want to drop -dsuppress-all when you dump the programs yourself.

Dumps of your programs

Non-exploding version:

❯ ghc SO.hs -ddump-simpl -fforce-recomp -O0 -dsuppress-all
[1 of 1] Compiling Main             ( SO.hs, SO.o )

==================== Tidy Core ====================
Result size of Tidy Core = {terms: 29, types: 28, coercions: 0}

$dShow_rq2
$dShow_rq2 = $fShow[] $fShowChar

Rec {
rList_reI
rList_reI =
  \ ds_dpU ->
    case ds_dpU of _ {
      [] -> return $fMonadIO ();
      : x_aho xs_ahp -> rList_reI xs_ahp
    }
end Rec }

main
main =
  >>
    $fMonadIO
    (print $dShow_rq2 (unpackCString# "test"))
    (rList_reI (enumFrom $fEnumInt (I# 1)))

main
main = runMainIO main

The important part is the location of [1..], almost at the end:

enumFrom $fEnumInt (I# 1))

As you can see, the list isn't a CAF. But what happens if we instead use the exploding version?

Exploding version

❯ ghc SO.hs -ddump-simpl -fforce-recomp -O0 -dsuppress-all
[1 of 1] Compiling Main             ( SO.hs, SO.o )

==================== Tidy Core ====================
Result size of Tidy Core = {terms: 32, types: 31, coercions: 0}

$dShow_rq3
$dShow_rq3 = $fShow[] $fShowChar

Rec {
rList_reI
rList_reI =
  \ ds_dpV ->
    case ds_dpV of _ {
      [] -> return $fMonadIO ();
      : x_ahp xs_ahq -> rList_reI xs_ahq
    }
end Rec }

lst_rq4
lst_rq4 = enumFrom $fEnumInt (I# 1)

main
main =
  >>=
    $fMonadIO
    (print $dShow_rq3 (unpackCString# "test"))
    (\ _ -> rList_reI lst_rq4)

main
main = runMainIO main

There's suddenly a new top-level expression, namely lst_rq4, which generates the list. And as seen before, GHCi retains the evaluations of top-level expressions, so lst_rq4 will also be retained.

Now there is an option to discard the evaluations:

Turning on +r causes all evaluation of top-level expressions to be discarded after each evaluation (they are still retained during a single evaluation).

But since "they are still retained during a single evaluation" even :set +r won't help you in this case. Unfortunately I cannot answer why GHC introduces a new top-level expression.

Why does this even happen in optimized code?

The list is still a top-level expression:

main2
main2 = eftInt 1 2147483647

Funny enough, GHC actually doesn't create an infinite list, since Int is bounded.

How can one get rid of the leak?

In this case you can get rid of it if you place the list in test:

test = do
   x <- print "test"
   rList [1..]

This will prevent GHC from creating a top-level expression.

However, I can't really give a general advise on this. Unfortunately, my Haskell-fu isn't yet good enough.

Thanks, your post was quite illuminating. Unfortunately I cannot use your solution in my code! I will do some fiddling and see if I can find another way. — artella, Jul 28 '14 at 11:38
@artella: I thought so. [Don's and the other answers could help you a little bit more](http://stackoverflow.com/questions/6090932/how-to-make-a-caf-not-a-caf-in-haskell). — Zeta, Jul 28 '14 at 11:48
Thanks. I had seen this post before and tried some of the tricks mentioned there but unfortunately they don't work (e.g. wrapping `[1..]` in a `() -> [Int]` function and passing to `test`). — artella, Jul 28 '14 at 12:14