Different, interacting levels of state in haskell

Question

I'm emulating a 4 bit microprocessor. I need to keep track of the registers, the memory and the running output (bonus points for also having a fetch-execute cycle counter). I've managed to do this without monads, but it feels messy passing around that much stuff at once explicitly. Also the function definition is messy, long and hard to read.

I've tried to do this with monads and it just doesn't fit together. I tried treating all the separate state components as a single type, but that left me with the problem of what to make the value.

State Program () -- Represents the state of the processor after a single iteration of the fetch execute cycle

Was the only type that made any sense. But at that point why even bother? I tried breaking it up by pulling the string out of my composite type and treating it as the value

State Program' String

which worked great, except for the fact that I needed RUNNING output. No matter what I did I couldn't hold on to both the string and the state at the same time.

Now I'm trying to grapple with monad transformers. It seems like I have to separate out all the different levels of state. But my head is exploding fast.

StateT Registers (StateT Memory (State Output)) a = 
StateT (registers -> (StateT Memory (State Output)) (a,registers))

StateT Registers (StateT Memory (State Output)) a = 
StateT (registers -> (Memory -> (Output -> (((a,Registers),Memory),Output))))

I haven't even put in the FEcycle counter yet!

Questions:

Am I on the right track?
Seeing as I'm pulling out monad transformers now, is it possible to stop treating "running output" as state and just palm it off to the IO monad? That would be awesome, instead of holding on to it I could just print it.
How many layers should I separate the state into? I can see two distinct layers, but they depend on each other closely (both the memory and the registers depend on the state of both the memory and the registers). Should I keep them together as a single state or separate them out and stack them up? Which approach will produce the most readable code?

Perhaps "running output" might be best represented using the Writer monad (see http://monads.haskell.cz/html/writermonad.html)? — Jeff Foster, Feb 01 '12 at 09:22

score 10 · Accepted Answer · edited May 23 '17 at 12:16

10

Layering multiple state monads on top of each other is a bad idea: you will have to compose a bunch of lifts to get at each piece of state, identified only by the number of layers down the stack it is. Yuck! Indeed, the mtl library in general is designed to be used, with rare exceptions, with one monad transformer of each "kind" in a stack.

Instead, I would suggest StateT Program IO (). The interface to the state is the same and you can, as you said, do output in IO simply by using liftIO. Sure, the value type is (), but what's wrong with that? There's no relevant value you can return from the top-level emulator. And, of course, you're likely to have smaller, reusable components as part of your emulator, and those will have relevant result types. (Indeed, get is one such component.) There's nothing wrong with having no meaningful return value at the top level.

As far as conveniently accessing each portion of the state, what you're looking for is lenses; this Stack Overflow answer is an excellent introduction. They let you simply and easily access and modify independent parts of your state. For example, with the data-lens implementation, you can easily write something like regA += 1 to increment regA, or stack %= drop 2 to remove the first two elements of the stack.

Sure, it's essentially turning your code into imperative mutation of a set of global variables, but this is actually an advantage, since that's exactly the paradigm the CPU you're emulating is based in. And with the data-lens-template package, you can derive these lenses from a record definition in a single line.

edited May 23 '17 at 12:16

Community

1
1

answered Feb 01 '12 at 09:30

ehird

40,602
3
180
182

That's bloody awesome. Time to learn template haskell. Is it still easy to find the functional definition backing stuff like regA += 1? Cause while that is nice and readable and most clearly expresses the (very imperative) intent, it looks nothing like functional code. – TheIronKnuckle Feb 01 '12 at 10:05
1

You don't have to learn Template Haskell to use data-lens-template; just stick `{-# LANGUAGE TemplateHaskell #-}` at the top of your file and `makeLenses [''Program]` somewhere below the definition of `Program`. As far as the definition of `(+=)`, sure; they're just simple wrappers of the core lens API for `StateT`; [the source](http://hackage.haskell.org/packages/archive/data-lens/2.0.2/doc/html/src/Data-Lens-Strict.html) is linked from the Hackage documentation. – ehird Feb 01 '12 at 10:18
2

Minor quibble - layered state monads are a bad idea - _unless you really want them_. They allow partitioning the state, so you can restrict some clients to only have read access to a particular layer of the state rather than read-write access, this is used for "security conscious" code. Otherwise good answer. – stephen tetley Feb 01 '12 at 18:39
1

@stephentetley: Indeed, thus the "rare exceptions" :) I would tend to wrap such things in `newtype`s to avoid the ambiguity you'd get with `MonadState` definitions, anyway. – ehird Feb 01 '12 at 18:58
The last time I used this design pattern: by the time I finished optimizing it the signature had changed from something like `State Program [Word8]` to something more like `ReaderT (STProgram s) (ST s) [Word8]` – Jeremy List Jun 09 '15 at 06:01

score 2 · Answer 2 · answered Feb 01 '12 at 09:25

A simple way to do this would be to create a data type that represents the registers and memory:

data Register = ...
data Memory = ...
data Machine = Machine [Register] Memory

Then have some functions that update the registers/memory. Now use this type for your state and the output for your type:

type Simulation = State Machine Output

Now each operation could be in the form:

operation previous = do machine <- get
                        (result, newMachine) <- operate on machine
                        put newMachine
                        return result

Here previous is the previous output of the machine. You can incorporate it into the result as well.

So the Machine type represents the state of the machine; you're threading the output of the previous operations through it.

A more sophisticated way would be to use state threads (Control.Monad.ST). These let you use mutable references and arrays inside a function while being guaranteed purity on the outside.

Different, interacting levels of state in haskell

2 Answers2