Why creating and disposing temporal ByteStrings eats up my memory in Haskell?

Question

Here is a code which creates 1M Int numbers and put them in a list.

main = do
  let l =  [1..1000000]
  putStrLn $ show $ sum (foldl (\aux p -> p:aux) [] l)

(I know it could be more optimal (sum in the fold) but my point is different.) And look at this version

import qualified Data.ByteString.Lazy.Char8 as B
import qualified Data.ByteString.Lazy.Builder as Builder
import Data.ByteString.Lazy.Builder.ASCII
import Data.Maybe
import Data.List

main = do
    let l = map (Builder.toLazyByteString . intDec ) [1..1000000]
    let l2 = map (fst . fromJust . B.readInt) l
    putStrLn $ show $ sum (foldl' (\aux p -> p:aux) [] l2)

This version needs 90MB memory! Why? Here is a profiling output

Memory consumption of creating 1M ByteString

What is the purple area?

EDIT after reading the comments I would like to add some clarification.

This is a test. I want to keep 1M numbers in the memory (I'm building a lookup table). So I "want to force the entire list to be held in memory". But I do not want to hold the bytestrings. My small code is a simulation of a case when I load the bytestrings from the disk, convert it to integers and keep the integers in the memory. (that's my goal). But somehow bytestrings remain in the memory. Why?

I don't know enough about these builder libraries, but in general it seems not quite right to use lazy bytestrings for such short strings. Replacing `(Builder.toLazyByteString . intDec)` with `(B.pack . show)` makes it run three times faster on my machine. — firefrorefiddle, Aug 21 '13 at 06:51
Reversing the final list before calling `sum` forces the whole thing to stay in memory rather than being garbage collected. Changing the last line to read `putStrLn $ show $ sum l2` reduces the maximum working set to 42k on my computer. — Michael Steele, Aug 21 '13 at 18:28
@MikeHartl `B.pack.show` also allocated 80MB memory. So this is not the problem. — halacsy, Aug 21 '13 at 19:39

score 2 · Accepted Answer · answered Aug 21 '13 at 21:11

The problem is laziness here.

main = do
    let l = map (Builder.toLazyByteString . intDec ) [1..1000000]
    let l2 = map (fst . fromJust . B.readInt) l
    putStrLn $ show $ sum (foldl' (\aux p -> p:aux) [] l2)

keeps the ByteStrings until sum forces the thunks fst . fromJust . B.readInt to be evaluated. Before that, what you have is a list of thunks each having a reference to one short ByteString - the reversing forces the entire list of thunks + ByteStrings into memory at once.

If you don't want to keep the ByteStrings around, force the thunks to be evaluated,

main = do
    let l = map (Builder.toLazyByteString . intDec ) [1..1000000]
    let l2 = map (fst . fromJust . B.readInt) l
    putStrLn $ show $ sum (foldl' (\aux p -> p `seq` (p:aux)) [] l2)

produces a much smaller memory usage, needing not much more than the plain list of Ints:

enter image description here

Wow. This seems to be the solution. Why isn't it enough to use `b2i x = x `seq` ((fst . fromJust . B.readInt) x)` ? — halacsy, Aug 21 '13 at 21:34
The `seq` in that forces only the `ByteString` (`x`) to be evaluated; and that only when `b2i x` is evaluated. But `map b2i` doesn't evaluate any of the `b2i x` thunks it creates. The consumer of the list created by `map` must evaluate them (all in all, it's much better for `map` to not evaluate the list elements; sometimes, evaluating would be better, you can write your own strict ``map' f (x:xs) = let z = f x in z `seq` (z : map' f xs); map' _ [] = []`` for such cases, using that `map'` for `l2` also fixes the allocation here). — Daniel Fischer, Aug 21 '13 at 21:49

bheklilr · Answer 2 · 2013-08-21T14:40:55.820

1

When I ran your code, I got a stack space overflow exception. So I looked at what was likely causing it, your use of sum. Now, I know that you should never have to hold built-in functions suspect, but this particular function is terrible in that it doesn't compute a partial result as it goes along the list. If you use

mySum :: Num a => [a] -> a
mySum xs = go 0 xs
    where
        go accum [x]    = accum + x
        go accum (x:xs) = go s xs where !s = accum + x

instead (and enable the bang patterns extension because they're easier), then it runs in 44 kB.

EDIT: I'm apparently dumb, just compiling with -O2 makes sum optimize to a strict version. However, your intermediate list of foldl' (\aux p -> p:aux) [] l2 eats up a lot of RAM because it's building the entire list.

edited Aug 21 '13 at 14:40

answered Aug 21 '13 at 12:42

bheklilr

53,530
6
107
163

3

Or just `foldl' (+) 0`. – danidiaz Aug 21 '13 at 14:24
2

Looking at the code I have the impression that `sum` is specialized to strict versions for `Int` and `Integer`. Did you compile with `-O`? – Joachim Breitner Aug 21 '13 at 14:32
@DanielDíazCarrete I didn't realize that one worked too. I like your answer better. Interestingly, this gives the exact same memory usage on my machine. – bheklilr Aug 21 '13 at 14:32
@JoachimBreitner If I compile @user1590575's code with -O2, it actually runs to completion, but it still uses 14 MB of RAM at most. If I remove the `foldl' (\aux p -> p:aux) [] l2` and replace it with just `l2`, it runs with ~41 kB. If I run it with `mySum`, it runs in the same amount of RAM to within a few bytes, and only about 50 ms slower, so I guess remembering to optimize is the best way. I really shouldn't answer SO questions before I have coffee in the morning... – bheklilr Aug 21 '13 at 14:38
I understand that I can optimize sum. This is not my point. My question is, why is `map ( fst . fromJust . B.readInt . Builder.toLazyByteString . intDec)` using so much memory? – halacsy Aug 21 '13 at 15:44
@user1590575 are you sure? When I run it, it says it uses a maximum of 41 MB with `+RTS -t`, but if I remove the `foldl' (\aux p -> p:aux) []`, it runs in 2MB of RAM. I don't know why you're reversing your list before calculating the sum, but it forces the entire list to be held in memory. Since this is done lazily, you will have a thunk for each element of the list representing your conversion to and from bytestring. You can get this down a bit by using a strict version of your reverse step, but it isn't really worth it since the reverse is what is causing the high memory usage. – bheklilr Aug 21 '13 at 16:38
this is a test. I want to keep 1M numbers in the memory (I'm building a lookup table". So I "want to force the entire list to be held in memory". But I do not want to hold the bytestrings. My small code is a simulation of a case when I load the bytestring from the disk, convert it to integers and keep the integers in the memory. (that's my goal). But somehow bytestrings remain in the memory. Why? – halacsy Aug 21 '13 at 18:23
@user1590575 I've poked at it a bit, but I can't get ghc to give me enough information to figure out what's going on. The graphs I'm getting show that both the bytestrings and the ints are getting allocated simultaneously (which is what we want), and deallocated simultaneously.The GC doesn't clean up all of the bytestrings at the same time, so it tapers off. I wouldn't worry much about your memory usage if it tapers off within a second. If you want to read more about profiling and optimization, I'd suggest reading http://book.realworldhaskell.org/read/profiling-and-optimization.html – bheklilr Aug 21 '13 at 19:05
My problem is that it's deallocating simultaneously. I want to keep the `[Int]` in the memory but not the `[ByteString]`. – halacsy Aug 21 '13 at 19:34

Why creating and disposing temporal ByteStrings eats up my memory in Haskell?

2 Answers2