I have a Producer
that creates values that depend on randomness, using my own Random
monad:
policies :: Producer (Policy s a) Random x
Random
is a wrapper over mwc-random
that can be run from ST
or IO
:
newtype Random a =
Random (forall m. PrimMonad m => Gen (PrimState m) -> m a)
runIO :: Random a -> IO a
runIO (Random r) = MWC.withSystemRandom (r @ IO)
The policies
producer yields better and better policies from a simple reinforcement learning algorithm.
I can efficiently plot the policy after, say, 5,000,000 iterations by indexing into policies
:
Just convergedPolicy <- Random.runIO $ Pipes.index 5000000 policies
plotPolicy convergedPolicy "policy.svg"
I now want to plot the intermediate policies on every 500,000 steps to see how they converge. I wrote a couple of functions that take the policies
producer and extract a list ([Policy s a]
) of, say, 10 policies—one every 500,000 iterations—and then plot all of them.
However, these functions take far longer (10x) and use more memory (4x) than just plotting the final policy as above, even though the total number of learning iterations should be the same (ie 5,000,000). I suspect that this is due to extracting a list inhibiting the garbage collector, and this seems to be an unidiomatic use of Pipes:
Idiomatic pipes style consumes the elements immediately as they are generated instead of loading all elements into memory.
What's the correct approach to consuming a pipe like this when the Producer
is over some random monad (ie Random
) and the effect I want to produce is in IO
?
Put another way, I want to plug a Producer (Policy s a) Random x
into a Consumer (Policy s a) IO x
.