Prevent list comprehension from accumulating in memory

Question

I'd like to work sequentially on a list of digits that follow a particular pattern:

99999999999999
99999999999998
...
99999999999991
99999999999989

In other words, 14 digits of decreasing numeric value but only using the digits 1 through 9. Each digit is used individually, so the representation as a number is only for convenience.

For this sort of thing I've always used a combination of a list comprehension and sequence, like so:

ins = let dig = reverse ['1'..'9'] in sequence (replicate 14 dig)

and relied on Haskell's laziness to ensure not all 14 trillion elements are created at once. Alas, this is back-firing, since my application is exploding in memory usage. I've whittled it waaaaay down to this MRE:

run = let ins = let dig = reverse ['1'..'9'] in sequence (replicate 14 dig)
          outs = map (\x -> (x, map digitToInt x)) ins
      in filter isGood outs
  where isGood (_,i) = if sum i == 50 then trace "Found one." True else False

My expectation is that filter isGood outs would iteratively evaluate an element in ins, and discard all those that are not isGood. What actually happens is that within a few seconds, the application uses GB of RAM even before a single isGood is found, and it keeps rising rapidly until I kill it. I've tried various combinations of head and dropWhile to try to force thunks to be evaluated and unused results to be discarded, but to no effect. I seem to have narrowed the culprit down to the sequence line.

Can this sort of list comprehension be used without keeping all generated results in RAM? Is there another list generation technique (short of a recursive function that calculates the next element based by parsing the current element) I can use that might do better?

Full compilable program for reference:

module Main where

import Data.Char (digitToInt)
import Debug.Trace (trace)


main :: IO ()
main = do
  print run

run = let ins = let dig = reverse ['1'..'9'] in sequence (replicate 14 dig)
          outs = map (\x -> (x, map digitToInt x)) ins
      in filter isGood outs
  where isGood (_,i) = if sum i == 50 then trace "Found one." True else False

Bouyed by the power of the MRE, I gave this a go instead: `ins = filter (notElem '0') (map show [99999999999999,99999999999998..11111111111111])`. Crude, but wow - memory usage hits **2.5MB** and doesn't budge, while execution seems to be much the same. So `sequence` is to be avoided in this case? — Heath Raftery, Sep 26 '22 at 03:18
Aside: `[x | x <- xs]` is just `xs`. So you could simplify further to `let digs = sequence $ replicate 14 ['9', '8'..'1']`. — amalloy, Sep 26 '22 at 03:49
Oh yeah @amalloy of course! I got wound up in complexity that is entirely unnecessary here. Nice one, that is indeed equivalent, including the memory usage issue! So that narrows it down even further - this isn't a list comprehension issue, per se. — Heath Raftery, Sep 26 '22 at 04:25
The `sequence` library function is known to have suboptimal memory requirements when used to make a large Cartesian product. See for example: [SO question 61875763](https://stackoverflow.com/questions/61875763/haskell-running-out-of-memory-with-finite-lists) — jpmarinier, Sep 26 '22 at 09:32
For those arriving here post closure, note that the question has been edited to remove the list comprehensions so it's a little hard to follow now. But ultimately the list comprehension turns out to be a red herring, and it is indeed `sequence` that accumulates build products. The linked answer explains this nicely as "GHC decides to keep [partial subsequences] around so they don't need to be recomputed each time each time", and my comment above provides a specific workaround for the question here: use a `filter`ed range instead. — Heath Raftery, Sep 26 '22 at 19:17

Prevent list comprehension from accumulating in memory

0 Answers0