I'm writing a very simple two-pass assembler in Haskell and I've come across a scenario that I don't yet have the experience to solve. I think the solution is likely to involve monad transformers, which I don't really understand.
The assembler parses the assembly code into a list of Statement
s, which are either instructions or labels. Some Statement
s may refer to labels. The assembler needs to convert the Statement
s into Instruction
s, which involves eliminating the labels and substituting the label references with an appropriate value.
I have written the first pass of the assembler, which produces a [(String, Int)]
representing a map from labels to addresses. I have also written the following function for translating a Statement
into an Instruction
:
stmtToInstruction :: Int -> [(String, Int)] -> Statement -> Either String [I.Instruction]
stmtToInstruction addr labels stmt = case stmt of
ADD d s1 s2 -> Right [I.ADD d s1 s2]
BEQL s1 s2 l -> case do label <- find (\e -> fst e == l) labels
let labelAddr = snd label
let relativeAddr = I.ImmS $ fromIntegral (labelAddr - addr)
return (I.BEQ s1 s2 relativeAddr) of
Just i -> Right [i]
Nothing -> Left $ "Label " ++ l ++ " not defined"
LABEL _ -> Right []
I've omitted several cases for brevity, but you can see all the possible results here:
ADD
always succeeds and produces an instructionBEQL
can either succeed or fail, depending on whether a label is foundLABEL
always succeeds, even though it produces no actual instructions
This works as expected. The problem I now have is writing this function:
replaceLabels :: [Statement] -> Either String [I.Instruction]
replaceLabels
takes a list of statements, and runs stmtToInstruction
on each one. The addr
argument to stmtToInstruction
must be the length of the [Instruction]
accumulated so far. The output may either be a Left String
, if one of the label references was invalid, or a Right [I.Instruction]
, if there were no errors.
mapM :: Monad m => (a -> m b) -> [a] -> m [b]
gets us some of the way there, but provides no way to inject the current address into the (a -> m b)
function. How do I make this work?