Operating infinite lists with strict monads

Question

I have a function f :: [a] -> b that operates on infinite lists (e.g. take 5, takeWhile (< 100) . scanl (+) 0 and so on). I want to feed this function with values generated by strict monadic actions (e.g. randomIO).

From this question, I've learned that the repeat and sequence trick approach does not work for strict monads, as the example below show:

import Control.Monad.Identity

take 5 <$> sequence (repeat $ return 1) :: Identity [Int]
-- returns `Identity [1,1,1,1,1]`
-- works because Identity is non-strict

take 5 <$> sequence (repeat $ return 1) :: IO [Int]
 -- returns `*** Exception: stack overflow`
 -- does not work because IO is strict

So, instead, I thought about using the function "inside" the monadic context. I was inspired by this circular programming example and tried:

let loop = do
       x <- return 1
       (_, xs) <- loop
       return (take 5 xs, x:xs)
in  fst loop :: Identity [Int]
-- Overflows the stack

and

import Control.Monad.Fix

fst <$> mfix (\(_, xs) -> do
   x <- return 1
   return (take 5 xs, x:xs)) :: Identity [Int]
-- Overflows the stack

and even

{-# LANGUAGE RecursiveDo #-}
import System.Random

loop' = mdo
   (xs', xs) <- loop xs
   return xs'
where loop xs = do
      x <- randomIO
      return (take 5 xs, x:xs)

print $ loop'
-- Returns a list of 5 identical values

But none of these works. I also tried a Conduit approach which did not work either even in the Identity case:

import Conduit

runConduitPure $ yieldMany [1..] .| sinkList >>= return . take 5

Therefore I would like to know:

Why none of the "circular" approaches above work?
If there exists a solution to this that does not involve unsafeInterleaveIO. (maybe iteratees, Arrows?)

In general, this is a tough problem. For random numbers in particular, there is an easy way out: `randoms <$> newStdGen :: Random a => IO [a]` is an infinite random list of whatever you want (that is `Random`). — Alec, Feb 11 '17 at 05:59
@Alec Thanks for the comment. I am using `randomIO` here just for simplicity of the examples. In practice, I would like to process messages received via sockets. — Douglas Vieira, Feb 11 '17 at 06:13
So something like the infinite list of messages received from a socket? — Alec, Feb 11 '17 at 06:14
@Alec Exactly. More specifically, instead of `randomIO`, I am using `receive sock` where `sock` is a ZeroMQ `Socket z` monad (see https://hackage.haskell.org/package/zeromq4-haskell-0.6.5/docs/System-ZMQ4-Monadic.html ) — Douglas Vieira, Feb 11 '17 at 06:37
@DouglasVieira in the `conduit` program, shouldn't you be using `take` inside the conduit? `>>> runConduit $ sourceRandom .| takeC 5 .| sinkList :: IO [Bool]` gives me `[False,False,False,False,True]` Here `sourceRandom` is standing in for your stream of messages, but the pure case works as well. Really a list shouldn't be constructed, you should do whatever you were going to do with it inside the conduit. — Michael, Feb 11 '17 at 23:56
I don't think lazyiness can explain this, by the way. `Maybe` is lazy, but `take 5 <$> sequence (repeat $ Just 1) :: Maybe [Int]` also hangs since if the zillionth element of the list were `Nothing`, it would return `Nothing`. Similarly `take 5 <$> sequence (repeat [1]) :: [[Int]]` would be `[]` if somewhere in the list of lists there's an empty list. — Michael, Feb 12 '17 at 09:35
@Michael Regarding your first comment, see my comment on @duplode's answer. Regarding your second comment, I do not see exactly the point. For the `Maybe` monad, I guess the problem is that only possible fixed point in the monadic context (between `Just a` or `Nothing`) is `Nothing`, so `sequence`ing an infinite list of `Just a`s will just hang forever. `Identity a` does not have this issue because the fixed point of the monadic context is already `Identity a` (the same intuitively would also apply for `IO a` if there were no side-effects). — Douglas Vieira, Feb 12 '17 at 16:54
My point was about strict v lazy. Identity is by the way maximally strict since it is a newtype. I think you just have to work through the definition of `sequence` i.e. `foldr (\m ms -> (:) <$> m <*> ms) (pure [])` together with the monad (applicative) implementation in each concrete case to see why in `Identity` things go so swimmingly and in `IO` (so often) badly. It is basically to avoid `sequence` that we have the streaming libraries. From the point of view of the streaming libraries, `sequence` `mapM` `replicateM` etc. for IO + lists are the fundamental red flags. — Michael, Feb 12 '17 at 17:31
@Michael Now I see, good point! Some examples illustrates the strictness property that you mentioned: `seq (return undefined :: Identity ()) ()`, `seq (return undefined :: Maybe ()) ()`, `seq (return undefined :: IO ()) ()`. As you pointed out, the case with `Identity` throws an error, as it is defined as a newtype. Interestingly, the case with `IO` does not throw an error, although `print 1 >> return ()` does print `1` in the terminal. In [John L's answer](http://stackoverflow.com/a/14496898/6537490), he mentions the strictness of `(>>=)`, which is the one that plays a role here. — Douglas Vieira, Feb 12 '17 at 19:02
With `Control.Monad.Trans.State.Lazy` if I write `a = take 5 <$> sequence (repeat $ return 1) :: StateT () Identity [Int]` I can write `>>> evalStateT a ()` and manage to get `Identity [1,1,1,1,1]` with `>>> runStateT a ()` I get `Identity ([1,1,1,1,1], ` and it pauses while it tries to get the () at the end of the rainbow - so there comes a stack overflow. With `Control.Monad.Trans.State.Strict` both end up in a stack overflow. `sequence` is really pretty toxic for lists, though there are uses of course. — Michael, Feb 12 '17 at 19:26

Alec · Accepted Answer · 2017-02-11T07:09:20.893

I am using randomIO here just for simplicity of the examples. In practice, I would like to process messages received via sockets

That is not possible without unsafeInterleaveIO. The problem at the end of the day is that IO actions must be sequenced. While the order of evaluation for referentially transparent values doesn't matter, that of IO actions may. If you want a lazy infinite list of all the messages received over a socket, you have to inform Haskell a priori where in the sequence of IO actions this fits (unless you use unsafeInterleaveIO).

The Arrow abstraction you are seeking is called ArrowLoop, but it too has a problem with the right tightening law for strict monads.

At first sight, it may look like MonadFix (manifested via mdo or mfix) is a solution too, but digging a bit deeper shows that fixIO has problems, just like loop from ArrowLoop.

However, sometimes the restriction that IO actions must be run one after another is a bit excessive and that is what unsafeInterleaveIO is for. Quoting the docs

unsafeInterleaveIO allows IO computation to be deferred lazily. When passed a value of type IO a, the IO will only be performed when the value of the a is demanded.

Now, even if you explicitly said that you didn't want an unsafeInterleaveIO solution, as I have hopefully managed to convince you it is the way to go, here it is:

interweavingRepeatM :: IO a -> IO [a]
interweavingRepeatM action = unsafeInterleaveIO ((:) <$> action <*> interweavingRepeatM action)

And here is is working for random numbers:

ghci> import System.Random
ghci> sourceOfRandomness <- interweavingRepeatM randomIO :: IO [Integer]
ghci> take 10 sourceOfRandomness
[-2002742716261662204,7803971943047671004,-8395318556488893887,-7372674153585794391,5906750628663631621,6428130029392850107,6453903217221537923,-8966011929671667536,6419977320189968675,-1842456468700051776]

Thanks a lot for this explanation. Could you elaborate more on: 1. How to address this issue for a general monad transformed from `IO`? 2. Why the `mdo` approach gave repeated values? — Douglas Vieira, Feb 11 '17 at 06:55
@DouglasVieira 1. I don't think you can in general - monads are meant to encode that nature of sequencing 2. `mdo` desugars to use `fixIO` and `fixIO` confusingly only evaluates its action _once_ (looking at the [source](https://hackage.haskell.org/package/base-4.9.1.0/docs/src/System.IO.html#fixIO) may be helpful). Even if you had managed to write something that really should have returned a list, you would have hit your head against an even more confusing `MVar` related error message (or worse yet - a hanging program). — Alec, Feb 11 '17 at 07:11
@DouglasVieira With respect to 1. - I see that the package you linked also exposes `IO` functions. You could use those and `unsafeInterleaveIO` to get your list, then `liftIO` back into your desired monad. — Alec, Feb 11 '17 at 07:13

score 3 · Answer 2 · edited Jun 20 '20 at 09:12

Alec's answer covers your general question. What follows is specifically about conduit and similar streaming libraries.

I also tried a Conduit approach which did not work either even in the Identity case:
import Conduit

runConduitPure $ yieldMany [1..] .| sinkList >>= return . take 5

While streaming libraries are commonly used to avoid the sort of difficulty you mention (cf. the opening remarks of Pipes.Tutorial), they assume you will use their stream types instead of lists. Consider, for instance, how sinkList is described by the Conduit documentation:

Consume all values from the stream and return as a list. Note that this will pull all values into memory.

That means using sinkMany immediately after yieldMany brings you back to square one: bringing all values into memory is precisely what makes the combination of sequence, IO and infinite lists unusable. What you should do instead is using the infrastructure of the streaming library to build the stages of your pipeline. Here are a few simple examples using mostly ready-made things from conduit and conduit-combinators:

GHCi> import Conduit
GHCi> runConduitPure $ yieldMany [1..] .| takeC 5 .| sinkList
[1,2,3,4,5]
GHCi> runConduit $ yieldMany [1..] .| takeC 5 .| printC -- try it without takeC
1
2
3
4
5
GHCi> runConduit $ yieldMany [1..] .| takeC 5 .| scanlC (+) 0 .| printC
0
1
3
6
10
15
GHCi> :{
GHCi| runConduit $ yieldMany [1..] .| takeC 5
GHCi|     .| awaitForever (\x -> liftIO (print (2*x)) >> yield x)
GHCi|     .| printC
GHCi| :}
2
1
4
2
6
3
8
4
10
5
GHCi> runConduit $ (sourceRandom :: Producer IO Int) .| takeC 5 .| printC 
1652736016140975126
5518223062916052424
-1236337270682979278
8079753510915129274
-609160753105692151

(Thanks Michael for making me notice sourceRandom -- at first I had rolled my own with repeatMC randomIO.)

Thanks for the clarification regarding `Conduit`. The problem, however, is that is that I want a solution to work on an arbitrary function `f :: [a] -> b` that really operates on (infinite) lists. Therefore, as you pointed out, `Conduit` is not the solution I am looking for. Even worse, as pointed out by @Alec, there is no solution to the general case. To give more context, I was trying to work on a simple implementation of Functional Reactive Programming, where the primitive type `Event :: Ord t => [(t, a)]` is an infinite list ordered by time (i.e. I actually wanted `f :: Event -> b`). — Douglas Vieira, Feb 12 '17 at 15:30
The functions `f` that you actually mention - "(e.g. take 5, takeWhile (< 100) . scanl (+) 0 and so on)" - are elementary stream transformations that all really operate on infinite streams. You haven't given a reason why you are using infinite lists rather than streams, which support almost the whole api of `Data.List` without needing `sequence` and co. That sequence can't be salvaged 'in the general case' is the whole point. — Michael, Feb 12 '17 at 17:50

Operating infinite lists with strict monads

2 Answers2