1

I need to make each instance of Sphere get a unique identifier so that no two Spheres are equal. I won't know ahead of time how many spheres I'll need to make so will need to make them one at a time, but still increment the identifier.

Most solutions I've tried have this issue where I end up with an IO a and need the unsafePerformIO to get the value.

This code comes close, but the resulting identifier is always the same:

module Shape ( Sphere (..)
             , sphere
             , newID
             ) where

import System.Random
import System.IO.Unsafe (unsafePerformIO)

data Sphere = Sphere { identifier :: Int
                     } deriving (Show, Eq)

sphere :: Sphere
sphere = Sphere { identifier = newID }

newID :: Int
newID = unsafePerformIO (randomRIO (1, maxBound :: Int))

This would work as well, and works great in the REPL, but when I put it in a function, it only returns a new value the first time and the same value after that.

import Data.Unique
sphere = Sphere { identifier = (hashUnique $ unsafePerformIO newUnique) }

I know think this all leads to the State Monad, but I don't understand that yet. Is there no other way that will "get the job done", without biting off all the other monad stuff?

Craig Treptow
  • 834
  • 7
  • 19
  • 1
    This is indeed something for which state-like monads are used. I know that is quite annoying. But once you understand what the benefits of such encapsulation are, it is actually quite effective. – Willem Van Onsem Jul 30 '20 at 19:40
  • 3
    `unsafePerformIO` is one of the trickiest things to use correctly in Haskell. It's almost always the hardest option to get correct for any particular way to solve a problem. Unless you absolutely need it for some concrete reason, you're better off learning something new to avoid it. – Carl Jul 30 '20 at 20:11
  • 2
    The easiest thing to do is just accept that `newID` cannot have type `Int` - it must have type `IO Int` (same as you have now but without the `unsafePerformIO`). It fundamentally isn't a "pure value" so you shouldn't try to pretend it is. Anything else you have that depends on that random integer can be written as a function that takes an `Int` as argument - then in your final program (which runs in `IO` anyway) you can generate the random number with `newID`, extract the result, and feed it to your other functions. – Robin Zigmond Jul 30 '20 at 21:15
  • 3
    Don't use `unsafe` functions, period. Those are marked as "unsafe" for a reason. Using those amounts to 1) force the compiler into assuming some properties on the code, and 2) be willing to face the horrible consequences if the assumption did not hold. In your code, you promise that that IO action will always produce the same result, no matter how many times it's called, so it can be regarded as a pure value. The compiler is then free to run the IO action any number of times, or only once and cache the result. – chi Jul 30 '20 at 23:52
  • unsafePerformIO lacks all guarantees. For example, it lacks the guarantee that running in repl and running in compiled code does the same thing. – PyRulez Jul 31 '20 at 18:07
  • Possibly related: https://stackoverflow.com/questions/6311512/creating-unique-labels-in-haskell/63406066 . – atravers Sep 21 '20 at 03:13

1 Answers1

6

First of all, don’t use unsafePerformIO here. It doesn’t do what you want anyway: it doesn’t “get the a out of an IO a”, since an IO a doesn’t contain an a; rather, unsafePerformIO hides an IO action behind a magical value that executes the action when somebody evaluates the value, which could happen multiple times or never because of laziness.

Is there no other way that will "get the job done", without biting off all the other monad stuff?

Not really. You’re going to have to maintain some kind of state if you want to generate unique IDs. (You may be able to avoid needing unique IDs altogether, but I don’t have enough context to say.) State can be handled in a few ways: manually passing values around, using State to simplify that pattern, or using IO.

Suppose we want to generate sequential IDs. Then the state is just an integer. A function that generates a fresh ID can simply take that state as input and return an updated state. I think you’ll see straight away why that’s too simple, so we tend to avoid writing code like this:

-- Differentiating “the next-ID state” from “some ID” for clarity.
newtype IdState = IdState Id

type Id = Int

-- Return new sphere and updated state.
newSphere :: IdState -> (Sphere, IdState)
newSphere s0 = let
  (i, s1) = newId s0
  in (Sphere i, s1)

-- Return new ID and updated state.
newId :: IdState -> (Id, IdState)
newId (IdState i) = (i, IdState (i + 1))

newSpheres3 :: IdState -> ((Sphere, Sphere, Sphere), IdState)
newSpheres3 s0 = let
  (sphere1, s1) = newSphere s0
  (sphere2, s2) = newSphere s1
  (sphere3, s3) = newSphere s2
  in ((sphere1, sphere2, sphere3), s3)

main :: IO ()
main = do

  -- Generate some spheres with an initial ID of 0.
  -- Ignore the final state with ‘_’.
  let (spheres, _) = newSpheres3 (IdState 0)

  -- Do stuff with them.
  print spheres

Obviously this is very repetitive and error-prone, since we have to pass the correct state along at each step. The State type has a Monad instance that abstracts out this repetitive pattern and lets you use do notation instead:

import Control.Monad.Trans.State (State, evalState, state)

newSphere :: State IdState Sphere
newSphere = do
  i <- newId
  pure (Sphere i)
-- or:
-- newSphere = fmap Sphere newId
-- newSphere = Sphere <$> newId

-- Same function as before, just wrapped in ‘State’.
newId :: State IdState Id
newId = state (\ (IdState i) -> (i, IdState (i + 1)))

-- Much simpler!
newSpheres3 :: State IdState (Sphere, Sphere, Sphere)
newSpheres3 = do
  sphere1 <- newSphere
  sphere2 <- newSphere
  sphere3 <- newSphere
  pure (sphere1, sphere2, sphere3)
  -- or:
  -- newSpheres3 = (,,) <$> newSphere <*> newSphere <*> newSphere

main :: IO ()
main = do

  -- Run the ‘State’ action and discard the final state.
  let spheres = evalState newSpheres3 (IdState 0)

  -- Again, do stuff with the results.
  print spheres

State is what I would reach for normally, since it can be used within pure code, and combined with other effects without much trouble using StateT, and because it’s actually immutable under the hood, just an abstraction on top of passing values around, you can easily and efficiently save and roll back states.

If you want to use randomness, Unique, or make your state actually mutable, you generally have to use IO, because IO is specifically about breaking referential transparency like that, typically by interacting with the outside world or other threads. (There are also alternatives like ST for putting imperative code behind a pure API, or concurrency APIs like Control.Concurrent.STM.STM, Control.Concurrent.Async.Async, and Data.LVish.Par, but I won’t go into them here.)

Fortunately, that’s very similar to the State code above, so if you understand how to use one, it should be easier to understand the other.

With random IDs using IO (not guaranteed to be unique):

import System.Random

newSphere :: IO Sphere
newSphere = Sphere <$> newId

newId :: IO Id
newId = randomRIO (1, maxBound :: Id)

newSpheres3 :: IO (Sphere, Sphere, Sphere)
newSpheres3 = (,,) <$> newSphere <*> newSphere <*> newSphere

main :: IO ()
main = do
  spheres <- newSpheres3
  print spheres

With Unique IDs (also not guaranteed to be unique, but unlikely to collide):

import Data.Unique

newSphere :: IO Sphere
newSphere = Sphere <$> newId

newId :: IO Id
newId = hashUnique <$> newUnique

-- …

With sequential IDs, using a mutable IORef:

import Data.IORef

newtype IdSource = IdSource (IORef Id)

newSphere :: IdSource -> IO Sphere
newSphere s = Sphere <$> newId s

newId :: IdSource -> IO Id
newId (IdSource ref) = do
  i <- readIORef ref
  writeIORef ref (i + 1)
  pure i

-- …

You’re going to have to understand how to use do notation and functors, applicatives, and monads at some point, because that’s just how effects are represented in Haskell. You don’t necessarily need to understand every detail of how they work internally in order to just use them, though. I got pretty far when I was learning Haskell with some rules of thumb, like:

  • A do statement can be:

    • An action: (action :: m a)

      • Often m () in the middle

      • Often pure (expression :: a) :: m a at the end

    • A let binding for expressions: let (var :: a) = (expression :: a)

    • A monadic binding for actions: (var :: a) <- (action :: m a)

  • f <$> action applies a pure function to an action, short for do { x <- action; pure (f x) }

  • f <$> action1 <*> action2 applies a pure function of multiple arguments to multiple actions, short for do { x <- action1; y <- action2; pure (f x y) }

  • action2 =<< action1 is short for do { x <- action1; action2 x }

Jon Purdy
  • 53,300
  • 8
  • 96
  • 166
  • 2
    Extra +1 because we're only allowed to vote once. Using `unsafePerformIO` to avoid monads is like lighting your house on fire to get the back door open: it's not going to work, and there's no small danger of disastrous consequences. Embrace the monads! You cannot resist their siren song! – K. A. Buhr Jul 30 '20 at 22:37
  • Thank you Jon Purdy, This looks good. Might take me a bit to grok it, but your rules of thumb look very useful! – Craig Treptow Jul 31 '20 at 17:06
  • @CraigTreptow: You’re welcome! I also recommend running [HLint](https://hackage.haskell.org/package/hlint) on your code—not only is it the standard linter for real-world Haskell projects, but I also learned a lot from it when I was learning Haskell because it can offer many useful suggestions, including the ones I listed at the end for replacing common `do` notation patterns with idiomatic operators. – Jon Purdy Jul 31 '20 at 17:21
  • @JonPurdy Oh I had no idea it would do that! Do you have a recommended formatter as well? – Craig Treptow Jul 31 '20 at 18:13
  • @JonPurdy I'm going through this and I think I'm understanding, but *seem* to have the same issue. Your code *does* increment the id when the state is initalized with 0 and they are created all at once. I don't know how many spheres I'll need, so, creating them one at a time with your code gives them all the id of 0 because I initialize the state with 0 each time. I'll still need something to keep track of that counter and increment it each time I need a sphere. Am I misunderstanding? Is there a way around that? – Craig Treptow Jul 31 '20 at 19:46
  • @CraigTreptow: Ah right, you need to stay in the context of `State IdState` and only call {`run`/`eval`/`exec`}`State` once if you want it to be shared. `State` gives you a single mutable value (which may be a compound value like a record) within the scope of a `runState`. If you need to interleave other effects like `IO`, you *could* use a transformer `StateT IdState IO`, but it might be simpler to use plain `IO` with either `Unique` or `IORef`. You can generate a *dynamic* number of objects with functions like `replicateM :: (Monad m) => Int -> m a -> m [a]`, e.g. `replicateM 10 newSphere`. – Jon Purdy Jul 31 '20 at 20:03
  • @CraigTreptow: I would add: don’t be afraid of using `IO`! Making code referentially transparent certainly has a lot of *advantages*, but it’s not a *requirement*; it’s certainly better to get something done *with* `IO`, *then* figure out how to do without it, than to struggle to get anything done at all. As for formatting, we use [brittany](https://github.com/lspitzner/brittany) at my job, and I’ve heard good things about [hindent](https://hackage.haskell.org/package/hindent) too, but I’m not super familiar with the tradeoffs between different formatting tools. – Jon Purdy Jul 31 '20 at 20:12
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/219012/discussion-between-craig-treptow-and-jon-purdy). – Craig Treptow Jul 31 '20 at 20:30