I suppose it is tempting, if the only tool you have is a hammer, to treat everything as if it were a nail.
Abraham Maslow.
How about something different - a unique-supply that isn't a member of the Monad
class. As it happens, you were almost there with your original type signature:
compileStatement :: Statement -> UniqueIDState -> [AbstractInstruction]
If the only requirement is that each label is unique - no need to count how many were used, providing the same identifiers given the same circumstances, etc - there's a less-invasive technique you can use.
In IO-free spillable splittable supplies, Luke Palmer shows how value supplies can be encapsulated:
runSupply :: (forall a . Eq a => Supply a -> b) -> b
This avoids having the monadic IO
type taint large parts of the programs which use them: nice! But that isn't the only problem - depending on how they're defined, the onus is on you to use such supplies correctly. For example, assuming:
data Statement =
... | If Statement Statement Statement | ...
then if:
compileStatement (If c t e) s =
case split s of
s1 : s2 : s3 : _ -> buildCondJump (compileStatement c s1)
(compileStatement t s2)
(compileStatement e s3)
is mistakenly changed to:
compileStatement (If c t e) s =
case split s of
s1 : s2 : s3 : _ -> buildCondJump (compileStatement c s)
(compileStatement t s)
(compileStatement e s)
not only are UniqueSupply
and Unique
values being erroneously reused, there's the potential for a space leak if any of the recursive calls to compileStatement
uses the supply intensively.
Unlike Clean, Haskell has no standard way of marking types as monousal. That leaves checks at runtime as the only option: definitely a job for an abstract data type!
Here's a thought - if that ADT was also spittable splittable, we could be able to use it to define an alternate value-supply type. All going well, the values of this new type would then have both properties: slippable splittable and monousal.
Looking at Data.Supply
reveals the use of a binary-tree type - the module and definitions seem to be based on the following example from the functional pearl [On generating unique names], written by Lennart Augustsson, Mikael Rittri and Dan Synek - from page 4 of 7:
module HideGensym(
Name, NameSupply, initialNameSupply, getNameDeplete, splitNameSupply)
where
gensym :: a -> Int -- implemented in assembler
data Name = MkName Int deriving (Eq)
data NameSupply = MkNameSupply Name NameSupply NameSupply
initialNameSupply = gen ()
where gen x = MkNameSupply (MkName (gensym x)) (gen x) (gen x)
getNameDeplete (MkNameSupply n s1 _) = (n, s1)
splitNameSupply (MkNameSupply _ s1 s2) = (s1, s2)
...gensym
: we'll leave that one for now. Let's look at how we can insert the new ADT into NameSupply
...after we attend to a more mundane matter: - two identical calls to gen
- MkNameSupply ... (gen x) (gen x)
- to an optimising Haskell implementation, they're the same value:
-- same function, same argument, same result: what's the matter?
initialNameSupply = gen ()
where gen x = let s = gen x in
MkNameSupply (MkName (gensym x)) s s
Then again, maybe we can solve both problems at once:
initialNameSupply = runUO gen
where gen u = let !(u1, u2) = splitUO2 u in
MkNameSupply (MkName (gensym ())) (gen u1) (gen u2)
where UO
will be our new use-once split-ready abstract-data type:
module UO(
UO, initUO, splitUO, splitUO2, ...
) where
data UO s ...
runUO :: (forall s . UO s -> a) -> a
splitUO :: UO s -> [UO s]
splitUO2 :: UO s -> (UO s, UO s)
⋮
...which can also be encapsulated.
(Surely there must be a better word than spiltable splittable in the English language...)
Now for the gensym
problem - let's start with this cautionary remark about that HideGensym
module, also on page 4 of 7:
The gensym
[thing] must be coded in assembler, and possibly also the gen
function.
...otherwise that single call to gensym
might be lifted all the way out: remember gen ()
?
{- WRONG! -}
initialNameSupply = runUO gen
where gen u = let !(u1, u2) = splitUO2 u in
MkNameSupply (MkName x) (gen u1) (gen u2)
x = gensym ()
Since gensym
(supposedly!) accepts any type of input:
gensym :: a -> Int -- implemented in assembler
this shouldn't break anything:
initialNameSupply = runUO gen
where gen u = let !(u1:u2:u3:_) = splitUO u in
MkNameSupply (MkName (gensym u1)) (gen u2) (gen u3)
As a bonus, we can make a slightly-more generic version of initialNameSupply
:
initialNameSupply = initialSupply gensym
initialSupply :: (UO s -> Int) -> NameSupply
initialSupply g = runUO gen
where gen u = let !(u1:u2:u3:_) = splitUO u in
MkNameSupply (MkName (g u1)) (gen u2) (gen u3)
(...alright, so gensym
is still there - at least now it's isolated.)
By now you've probably noticed the other example module OneTimeSupplies
, with its own cautionary remark:
It is referentially transparent only if each supply is used at most once.
In addition, back on page 3 of 7:
If a compile-time analysis of a program can guarantee that every name supply is used at most once, either to do getNameDeplete
or splitNameSupply
, the tree becomes unnecessary [...]
Since we're relying on UO
to provide the same guarantee, can we also go tree-free in our implementation and save some work?
To do that, supplyValue
and the split
s will need an upgrade:
the simplest option for supplyValue
is to provide it with the generator (g
in initialSupply
).
data NameSupply = forall s . Supply (UO s -> Int) ...
supplyValue :: NameSupply -> Name
supplyValue (Supply g ...) = MkName (g ...)
as for the split
s, they require an UO
value so they can obtain the new UO
values needed by the new supplies:
data NameSupply = forall s . Supply (UO s) ...
split :: NameSupply -> [NameSupply]
split (Supply u ...) = [ Supply v ... | v <- splitUO u ]
split2 :: NameSupply -> (NameSupply, NameSupply)
split2 (Supply u ...) = let !(u1, u2) = splitUO2 u in
(Supply u1 ..., Supply u2 ...)
That clearly suggests:
data NameSupply = forall s . Supply (UO s -> Int) (UO s)
supplyValue (Supply g u) = MkName (g u)
split (Supply g u) = [ Supply g v | v <- splitUO u ]
split2 (Supply g u) = let !(u1, u2) = splitUO2 u in
(Supply g u1, Supply g u2)
But does it also work for initialNameSupply
?
initialNameSupply = initialSupply gensym
initialSupply :: (UO s -> Int) -> NameSupply
initialSupply = runUO . Supply
It gets better:
type NameSupply = Supply Name
data Name = MkName Int deriving (Eq)
initialNameSupply = initialSupply (MkName . gensym)
-- NameSupply --
-- ================ --
-- Supply --
data Supply a = forall s . Supply (UO s -> a) (UO s)
instance Functor Supply where
fmap f (Supply g u) = Supply (f . g) u
supplyValue :: Supply a -> a
supplyValue (Supply g u) = g u
split :: Supply a -> [Supply a]
split (Supply g u) = [ Supply g v | v <- splitUO u ]
split2 :: Supply a -> (Supply a, Supply a)
split2 (Supply g u) = let !(u1, u2) = splitUO2 u in
(Supply g u1, Supply g u2)
initialSupply :: (UO s -> a) -> NameSupply
initialSupply = runUO . Supply
This is very promising, if UO
and associates can be defined as intended...
If you've read the post by Luke Palmer, you already know that he uses an ugly
unsafe
entity to define runSupply
. Well, right now (2022 Jan) runST
is defined in similar fashion:
runST :: (forall s. ST s a) -> a
runST (ST st_rep) = case runRW# st_rep of (# _, a #) -> a
where:
newtype ST s a = ST (STRep s a)
type STRep s a = State# s -> (# State# s, a #
runRW# :: STRep RealWorld a -> (# State# RealWorld, a #)
Can UO
be defined without resorting to such measures? That's probably worthy of a separate answer - for now, we'll just tolerate the ugliness:
{-# LANGUAGE BangPatterns, RankNTypes, UnboxedTuples, MagicHash #-}
module UO(
UO, runUO, splitUO, splitUO2,
useUO, asUO,
) where
import Prelude (String, Eq(..))
import Prelude ((.), ($), (++), error, all)
import Data.Char (isSpace)
import GHC.Base (State#, MutVar#)
import GHC.Base (runRW#, newMutVar#, noDuplicate#)
import GHC.Exts (atomicModifyMutVar#)
import GHC.ST (ST(..), STRep)
data UO s = UO (UO# s)
runUO :: (forall s . UO s -> a) -> a
runUO g = let (# _, r #) = runRW# (useUO# (g . UO)) in r
splitUO :: UO s -> [UO s]
splitUO u = let !(u1, u2) = splitUO2 u in u1 : splitUO u
splitUO2 :: UO s -> (UO s, UO s)
splitUO2 (UO h) = let (# h1, h2 #) = splitUO2# h in (UO h1, UO h2)
useUO :: (UO s -> a) -> ST s a
useUO g = ST (\s -> useUO# (g . UO) s)
asUO :: Eq a => String -> ST s a -> UO s -> a
asUO name (ST act) (UO h)
= asUO# name act h
-- local definitions --
type UO# s = String -> State# s
splitUO2# :: UO# s -> (# UO# s, UO# s #)
splitUO2# h = let !s = h "splitUO2"
(# s', h1 #) = dispense# s
(# _, h2 #) = dispense# s'
in (# h1, h2 #)
useUO# :: (UO# s -> a) -> STRep s a
useUO# g s = let (# s', h #) = dispense# s
!r = g h
in (# s', r #)
dispense# :: STRep s (UO# s)
dispense# s = let (# s', r #) = newMutVar# () s
in (# s', expire# s' r #)
expire# :: State# s -> MutVar# s () -> String -> State# s
expire# s r name = let (# s', () #) = atomicModifyMutVar# r use s
in s'
where
use x = (error nowUsed, x)
nowUsed = name' ++ ": already expired"
name' = if all isSpace name then "(unknown)"
else name
asUO# :: Eq a => String -> STRep s a -> UO# s -> a
asUO# name act h = let (# _, t #) = act (noDuplicate# (h name)) in t
It's a little more complicated than strictly necessary (e.g. rudimentary reuse-error reporting) but in exchange for that, UO
-based definitions can now manipulate local state...
There's one other definition in Data.Supply
to implement:
newSupply :: a -> (a -> a) -> IO (Supply a)
newSupply start next = gen =<< newIORef start
where gen r = unsafeInterleaveIO
$ do v <- unsafeInterleaveIO (atomicModifyIORef r upd)
ls <- gen r
rs <- gen r
return (Node v ls rs)
upd a = let b = next a in seq b (b, a)
as it would end the need for gensym
. It is vaguely similar to initialSupply
- can that be made more apparent?
gen
in the original initialNameSupply
doesn't have a reference parameter r
:
newSupply start next = do r <- newIORef start
let gen = unsafeInterleaveIO $
do v <- unsafeInterleaveIO (atomicModifyIORef r upd)
ls <- gen
rs <- gen
return (Node v ls rs)
gen
where upd a = let b = next a in seq b (b, a)
the value-action unsafeInterleaveIO (atomicModifyIORef r upd)
performs the role of gensym
in the original initialNameSupply
:
newSupply start next = do r <- newIORef start
let gen = unsafeInterleaveIO $
do v <- genval
ls <- gen
rs <- gen
return (Node v ls rs)
genval = unsafeInterleaveIO (atomicModifyIORef r upd)
gen
where upd a = let b = next a in seq b (b, a)
gen
in the original initialNameSupply
had no need of do
-notation:
newSupply start next = do r <- newIORef start
let gen = unsafeInterleaveIO (liftM3 Node genval gen gen)
genval = unsafeInterleaveIO (atomicModifyIORef r upd)
gen
where upd a = let b = next a in seq b (b, a)
does genval
have to be in that let
-binding?
newSupply start next = do r <- newIORef start
let gen = unsafeInterleaveIO (liftM3 Node (genval r) gen gen)
gen
where genval r = unsafeInterleaveIO (atomicModifyIORef r upd)
upd a = let b = next a in seq b (b, a)
upd
is only used in genval
:
newSupply start next = do r <- newIORef start
let gen = unsafeInterleaveIO (liftM3 Node (genval r) gen gen)
gen
where genval r = let upd a = let b = next a in seq b (b, a)
in unsafeInterleaveIO (atomicModifyIORef r upd)
can some content in genval
be moved to a separate definition?
newSupply start next = do r <- newIORef start
let gen = unsafeInterleaveIO (liftM3 Node (genval r) gen gen)
gen
where genval r = unsafeInterleaveIO (nextValue r next)
nextValue :: IORef a -> (a -> a) -> IO a
nextValue r next = let upd a = let b = next a in seq b (b, a)
in atomicModifyIORef r upd
Now that it more clearly resembles the original initialNameSupply
, re-implementing newSupply
using our new Supply
type is relatively simple - first, a change of monadic type:
newSupply start next = do r <- newSTRef start
let gen = unsafeInterleaveST (liftM3 Node (genval r) gen gen)
gen
where genval r = unsafeInterleaveST (nextValue r next)
nextValue :: STRef s a -> (a -> a) -> ST s a
nextValue r next = let upd a = let b = next a in seq b (b, a)
in atomicModifyST r upd
No other changes are needed for nextValue
. As for newSupply
:
newSupply :: Eq a => a -> (a -> a) -> ST s (Supply a)
newSupply start next = do r <- newSTRef start
let g = asUO "genval" (genval r)
useUO (Supply g)
where genval r = nextValue r next
which can then be used to define our version of runSupply
:
runSupply :: (forall a . Eq a => Supply a -> b) -> b
runSupply f = f (runST (newSupply (0 :: Int) succ))
Can we now, finally, expel gensym
from the NameSupply
type?
initialNameSupply :: NameSupply
initialNameSupply = fmap MkName (initialSupply 0 succ)
initialSupply :: Eq a => a -> (a -> a) -> Supply a
initialSupply start next = runST (newSupply start next)
Yes.
Here are all the pertinent definitions, arranged into modules:
ExpelGensym
, the replacement for HideGensym
on page 4 of 7:
{-# LANGUAGE BangPatterns #-}
module ExpelGensym(
Name, NameSupply, initialNameSupply, getNameDeplete, splitNameSupply
) where
import Control.Monad (liftM)
import Control.Monad.ST (runST)
import Supply (Supply, newSupply, supplyValue, split2)
data Name = MkName Int deriving (Eq)
type NameSupply = Supply Name
initialNameSupply :: Supply Name
initialNameSupply = fmap MkName (initialSupply 0 succ)
getNameDeplete :: NameSupply -> (Name, NameSupply)
getNameDeplete s = let !(s1, s2) = split2 s in (supplyValue s1, s2)
splitNameSupply :: NameSupply -> (NameSupply, NameSupply)
splitNameSupply = split2
-- local definitions --
initialSupply :: Eq a => a -> (a -> a) -> Supply a
initialSupply start next = runST (newSupply start next)
Supply
, our miniature implementation of Data.Supply
:
{-# LANGUAGE BangPatterns, ExistentialQuantification, RankNTypes #-}
module Supply(
Supply, newSupply, runSupply, supplyValue, split, split2
) where
import Control.Monad.ST
import Data.STRef
import UO
data Supply a = forall s . Supply (UO s -> a) (UO s)
instance Functor Supply where
fmap f (Supply g u) = Supply (f . g) u
newSupply :: Eq a => a -> (a -> a) -> ST s (Supply a)
newSupply start next = do r <- newSTRef start
let g = asUO "genval" (genval r)
useUO (Supply g)
where genval r = nextValue r next
runSupply :: (forall a . Eq a => Supply a -> b) -> b
runSupply f = f (runST (newSupply (0 :: Int) succ))
supplyValue :: Supply a -> a
supplyValue (Supply g u) = g u
split :: Supply a -> [Supply a]
split (Supply g u) = [ Supply g v | v <- splitUO u ]
split2 :: Supply a -> (Supply a, Supply a)
split2 (Supply g u) = let !(u1, u2) = splitUO2 u in
(Supply g u1, Supply g u2)
-- local definitions --
nextValue :: STRef s a -> (a -> a) -> ST s a
nextValue r next = let upd a = let b = next a in seq b (b, a)
in atomicModifySTRef r upd
{-
-- if your Haskell installation doesn't define it --
atomicModifySTRef :: STRef s a -> (a -> (a, b)) -> ST s b
atomicModifySTRef r f = do x <- readSTRef r
let !(x', y) = f x
writeSTRef r x'
return y
-}
UO
, that use-once split-ready abstract-data type:
{-# LANGUAGE BangPatterns, RankNTypes, UnboxedTuples, MagicHash #-}
module UO(
UO, runUO, splitUO, splitUO2,
useUO, asUO,
) where
import Prelude (String, Eq(..))
import Prelude ((.), ($), (++), error, all)
import Data.Char (isSpace)
import GHC.Base (State#, MutVar#)
import GHC.Base (runRW#, newMutVar#, noDuplicate#)
import GHC.Exts (atomicModifyMutVar#)
import GHC.ST (ST(..), STRep)
data UO s = UO (UO# s)
runUO :: (forall s . UO s -> a) -> a
runUO g = let (# _, r #) = runRW# (useUO# (g . UO)) in r
splitUO :: UO s -> [UO s]
splitUO u = let !(u1, u2) = splitUO2 u in u1 : splitUO u
splitUO2 :: UO s -> (UO s, UO s)
splitUO2 (UO h) = let (# h1, h2 #) = splitUO2# h in (UO h1, UO h2)
useUO :: (UO s -> a) -> ST s a
useUO g = ST (\s -> useUO# (g . UO) s)
asUO :: Eq a => String -> ST s a -> UO s -> a
asUO name (ST act) (UO h)
= asUO# name act h
-- local definitions --
type UO# s = String -> State# s
splitUO2# :: UO# s -> (# UO# s, UO# s #)
splitUO2# h = let !s = h "splitUO2"
(# s', h1 #) = dispense# s
(# _, h2 #) = dispense# s'
in (# h1, h2 #)
useUO# :: (UO# s -> a) -> STRep s a
useUO# g s = let (# s', h #) = dispense# s
!r = g h
in (# s', r #)
dispense# :: STRep s (UO# s)
dispense# s = let (# s', r #) = newMutVar# () s
in (# s', expire# s' r #)
expire# :: State# s -> MutVar# s () -> String -> State# s
expire# s r name = let (# s', () #) = atomicModifyMutVar# r use s
in s'
where
use x = (error nowUsed, x)
nowUsed = name' ++ ": already expired"
name' = if all isSpace name then "(unknown)"
else name
asUO# :: Eq a => String -> STRep s a -> UO# s -> a
asUO# name act h = let (# _, t #) = act (noDuplicate# (h name)) in t