How exactly does `IO`'s >>= work under the hood?

Question

When explaining a concept like Monads to a beginner, I think it is helpful to avoid any complicated Haskell terminology, or anything category theory-like. I think a nice way to explain it is to build up a motivation for the function a -> m b with a straightforward type like Maybe:

data Maybe = Just a | Nothing

It's all or nothing. But what if we have some functions f :: a -> Maybe b and g :: b -> Maybe c and we want a nice way to combine them?

andThen :: Maybe a -> (a -> Maybe b) -> Maybe b
andThen Nothing _ = Nothing
andThen (Just a) f = f a

comp :: Maybe Text
comp = f a `andThen` g
  where f g a = etc...

You can then move into saying andThen could be defined for a variety of types (eventually forming the monad typeclass)... a compelling next example to me would be IO. But how would you define andThen for IO yourself? This has lead me to a question of my own... my naive implementation of andThenIO would be like so:

andThenIO :: IO a -> (a -> IO b) -> IO b
andThenIO io f = f (unsafePerformIO io)

But I know this isn't what is actually going on when you >>= using IO. Looking at the implementation of bindIO in GHC.Base I see this:

bindIO :: IO a -> (a -> IO b) -> IO b
bindIO (IO m) k = IO (\ s -> case m s of (# new_s, a #) -> unIO (k a) new_s)

And for unIO this:

unIO :: IO a -> (State# RealWorld -> (# State# RealWorld, a #))
unIO (IO a) = a

Which seems to relate to the ST monad somehow, though my knowledge of ST is next to nothing... I suppose my question is, what exactly is the difference between my naive implementation, and an implementation that uses ST? Is my naive implementation useful given the example or not given it isn't actually going on under the hood (and could be a misleading explanation)

You can't implement `IO` on your own using only operations guaranteed by the Haskell Report (except, of course, `andThen = (>>=)`, which doesn't really address the pedagogical goals). You have to think of `IO` as built-into-the-compiler magic... because it is. Even the "code" you see when you click source links on Hackage which appears to implement IO eventually bottoms out at calls to things that are even more primitive magic, and which require the existence of the GHC runtime (written in C). — Daniel Wagner, Aug 09 '18 at 15:33
The IO implementation in base isn't the real implementation of `IO`; That all happens through compiler magic. You _can_ implement a sort of fake IO with simulated files and in/outputs and basically make a `State` using these though. — Cubic, Aug 09 '18 at 15:35
GHC's under-the-hood implementation is based on the *totally bogus* idea that `IO` is just like a state transformer where the state being transformed is the state of the real world (which has type `State# RealWorld`). The implementation of `ST` is similar, but requires the actions run to be polymorphic in the `State# s`. As it turns out, the fiction works out much less badly for `ST` than for `IO` (although we'd need linear types to make it bulletproof). If you seek nice `Monad` examples that can express `IO`-like things, I suggest you look at "operational monads". — dfeuer, Aug 09 '18 at 15:38
Your `andThenIO` is too lazy. Consider `andThenIO (putStrLn "Hello") (\_ -> putStrLn "Goodbye")`: this will just print "Goodbye" and ignore the first `putStrLn` altogether. That's usually the danger of `unsafePerformIO`. You'll need `seq` in there to make it work. — sepp2k, Aug 09 '18 at 15:40
@sepp2k Or, more pedantically, we need `pseq` since we care about evaluation order, which `seq` does not guarantee (IIRC). — chi, Aug 09 '18 at 20:10
Related: https://stackoverflow.com/questions/10447914/io-implementation-inside-haskell . — atravers, Oct 08 '20 at 11:56

score 15 · Accepted Answer · edited Sep 19 '20 at 07:33

(Note: this answers to the "how to explain how IO works to a beginner" part. It does NOT attempt to explain the RealWorld# hack GHC uses. Indeed, the latter is not a good way to introduce IO.)

There are many ways to explain the IO monad to beginners. It is hard because different people mentally associate monads to different ideas. You can use category theory, or describe them as programmable semicolons, or even as burritos.

Because of this, when I tried to do that in the past, I generally tried many approaches until one of them "clicks" into the mental pattern of the learner. Knowing their background helps a lot.

Imperative closures

For instance, when the learner is already familiar with some imperative language with closures, e.g. JavaScript, I tend to tell them they can pretend that the whole point of a Haskell program is to generate a JavaScript closure, which is then run using a JavaScript implementation. In this make-believe explanation, a IO T type stands for an opaque type encapsulating a JavaScript closure, which, when run, will produce a value of type T, possibly after causing some side effects -- as JavaScript can do.

So, a value f :: IO String could be implemented as

let f = () => {
    print("side effect");
    return "result";
    };

and g :: IO () could be implemented as

let g = () => {
    print("g here");
    return {};
    };

Now, assuming to have such f closure, how to invoke it from Haskell? Well, one can not directly do that, since Haskell wants to keep side effects under control. That is, we can not do f ++ "hi" or f() ++ "hi".

Instead, to "invoke a closure" we can bind it to main

main :: IO ()
main = g

Indeed, main is the JavaScript closure which is generated by the whole Haskell program, and this will be invoked by the Haskell implementation.

OK, now the question becomes: "how to invoke more than one closure?". For that, one can introduce >> and pretend that it is implemented as

function andThenSimple(f, g) {
   return () => {
      f();
      return g();
      };
}

or, for >>=:

function andThen(f, g) {
   return () => {
      let x = f();
      return g(x)();  // pass x, and then invoke the resulting closure
      };
}

return is easier

function ret(x) {
   return () => x;
}

These function take a while to explain, but it is not that hard to grasp them if one understands closures.

Pure functional (AKA stay free)

Another option is to keep everything pure. Or at least as pure as possible. One can pretend that IO a is an opaque type defined as

data IO a
   = Return a
   | Output String (IO a)
   | Input (String -> IO a)
   -- ... other IO operations here

and then pretend that the value main :: IO () is then "run" by some imperative engine later on. A program like

foo :: IO Int
foo = do
  l <- getLine
  putStrLn l
  putStrLn l
  return (length l)

actually means, according to this interpretation,

foo :: IO Int
foo = Input (\l -> Output l (Output l (Return (length l))))

Of course here return = Return, and defining >>= is a nice exercise.

Currying impurity

Forget IO, monads, and all that stuff. One could understand better two simple concepts

a -> b   -- pure function type
a ~> b   -- impure function type

the latter being a make-believe Haskell type. Most programmers should be able to have a strong intuition about what these types represent.

Now, in functional programming, we have currying, which is an isomorphism between

(a, b) -> c

and

a -> (b -> c)

After some thinking, one can see that impure functions should admit some currying as well. One can indeed be convinced that there should be some isomorphism similar to

(a, b) ~> c
   <===>
a ~> (b ~> c)

With some more thought, one can even understand that the first ~> in a ~> (b ~> c) is actually inaccurate. The curried function above does not really perform side effects when a alone is being passed -- it is the passing of b which triggers the execution of the original uncurried function, causing side effects.

So, with this in mind, we can think of the impure currying as

(a, b) ~> c
   <===>
a -> (b ~> c)
--^^-- pure!

As a particular case, we get the isomorphism

(a, ()) ~> c
   <===>
a -> (() ~> c)

Further, since (a, ()) is isomorphic to a (some more convincing is required here), we can interpret currying as

a ~> c
  <===>
a -> (() ~> c)

Now, if we baptize () ~> c as IO c, we get

a ~> c
  <===>
a -> IO c

Ah-ha! This tells us that we do not really need the general impure function type a ~> c. As long as we have its special case IO c = () ~> c, we can represent (up to isomorphism) any a ~> c function.

From here, one can start to draw a mental picture about how IO c should work, and eventually realize its monadic structure. Essentially, this interpretation of IO c is now very similar to the one exploiting closures given above.

I thoroughly approve of `IO` as a datatype, but I think using existential quantification (with GADT syntax for clarity) is an important enough optimization to be worth introducing even for a beginner. That allows a `:>>=` or `:=<<` constructor so `>>=` takes constant time. Since the `IO` interface doesn't allow inspection, that's sufficient to give good performance up to a constant factor. `data IO :: Type -> Type where {Pure :: a -> IO a; (:=<<) :: (a -> IO b) -> IO a -> IO b; PutStr :: String -> IO (); GetLine :: IO String; ...}` — dfeuer, Aug 09 '18 at 18:23
@dfeuer I'd tend to agree. Depending on how much FP the beginner has seen so far, that GADT could be a better choice. — chi, Aug 09 '18 at 20:08
Ah, the classics live on - this observation is mentioned by [Wadler](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.91.3579&rep=rep1&type=pdf) and here on SO; see https://stackoverflow.com/questions/6647852/haskell-actual-io-monad-implementation-in-different-language/6706442#6706442 — atravers, Aug 05 '20 at 10:12

How exactly does `IO`'s >>= work under the hood?

1 Answers1

Imperative closures

Pure functional (AKA stay free)

Currying impurity

Linked