29

I'd like to learn the concept, so that I'd be able to understand and use libraries such as machines.

I tried to follow Rúnar Bjarnason's talk on machines, but there is too little information, basically just a bunch of data types. I can't even understand what k is in

newtype Machine k o = Step k o (Machine k o)
data Step k o r = Stop
                | Yield o r
                | forall t . Await (t -> r) (k t) r

or what's t is and why it's quantified. Or, what's the conceptual difference between conduit-like libraries and machines?

Gilles 'SO- stop being evil'
  • 104,111
  • 38
  • 209
  • 254
Petr
  • 62,528
  • 13
  • 153
  • 317

1 Answers1

45

conduit and pipes are both far more mature than machines, but -- that said -- machines is trying to take a different path than conduit and pipes.

With machines, I'm trying for a relatively simple API in terms of type arguments. Both conduit and pipes have chosen to unify all of their concepts by using 5-6 different type variable arguments.

Machines takes a different approach of parameterizing a machine (or Plan) on its "input language", which puts all the onus on a single extra argument (or two in the case of a Plan). Also by choosing to parameterize the input language in this way it opens up possibilities of using machines that can accept input (non-)deterministically from multiple input sources. The result is basically just a free monad with an extra 'emit' instruction!

In exchange for a slightly more rigorous policy about how you build and use machines, it can eventually offer better safety about asymptotics of the resulting code.

That said, pipes and conduit have had a lot of real world use and machines is more or less a playground for me, Rúnar Bjarnason and Paul Chiusano.

It is currently suitable for working on input that you intend to consume entirely, but less so for working with complicated resources or for parsing than what you get with those other two APIs.

Now, about that quantifier!

t there is actually existentially quantified. By doing so we can make the Monad for machines not care about the functoriality of the k parameter. This is important because of the way Source is implemented. If I didn't need Source to work, then we could use the simpler

data Step k o r = Stop
                | Yield o r
                | Await (k r) r

This would have the unfortunate side-effect that when you went to compose a Machine with a Source, that the compiler wouldn't know what Functor instance to pick and you'd be swimming in unnecessary type annotations.

The existential quantification there is a trick I picked up when working on the kan-extensions package. It is a generalization of one of the Yoneda types from there.

Edward Kmett
  • 29,632
  • 7
  • 85
  • 107
  • 1
    Thank you, could you perhaps elaborate a bit about `k`? Why does `Await` consist of `k r`, especially, why is `k :: * -> *` and parametrized by `r`? I tried to examine `Tee`, `Wye` and `Is` to see some examples of different `k`s, but still I'm far from understanding it. – Petr Jun 24 '13 at 20:12
  • 4
    `k a` is the type of a request you are making of whatever is feeding you data. The `a` represents the result of the request. `Await` effectively has two arguments: "The request with what to do if the request succeeds" and "what do to on a failure". For a single input you could use k = (->) i` so that `Machine ((->) i)` is given a function from `(i -> r)` that you can supply the input to. With `Tee` you wind up with one of two functions so you can block on either input separately. `Wye` permits you to block on either or both inputs at the same time. `Is` is just used like `(->)` above. – Edward Kmett Jun 24 '13 at 20:26
  • 1
    One more question, why there are `Plan`s and `Machine`s? Why the distinction? – Petr Jun 25 '13 at 07:37
  • 3
    By separating `Plan` from `Machine` we can do two things. First `Plan` is in CPS/Codensity form internally. This makes (>>=) cheap even if you have left associated binds (unlike with `pipes`) while `Machine` is in a more traditional ADT form that makes pattern matching cheaper. That means its cheap to build a `Plan`, but cheap to run a `Machine`. The `pipes` approach of using one type for both means one role suffers. In particular if you pipe then bind, you pay for how many steps were taken in the pipe to get over the (>>=)! – Edward Kmett Jun 27 '13 at 04:28
  • 3
    Also, by separating `Plan` from `Machine`, we get two different monads we can talk about. The monad for `Plan` is basically a free monad of requests in the input language followed by emissions on the output channel, while the monad for `Machine` is more like a list monad where you chain together the resulting machines forgetfully. Each has different uses but notably we can get by with pretty much standard combinators for working with each rather than make up our own classes/types for everything. – Edward Kmett Jun 27 '13 at 04:30
  • Thanks! Now I understand. I had the same problem when implementing [scala-conduit](https://github.com/ppetr/scala-conduit). Each monadic bind causes a new traversal of the whole structure. – Petr Jun 27 '13 at 05:53
  • 1
    NP: Note the version of `scala-machines` that I think is on Runar's account actually suffers the cost of binding as it is based on the simpler unified `Plan`/`Machine` model, but then the codensity trick doesn't work in Scala, so there it'd have to trampoline the `Plan` to get the right asymptotics. – Edward Kmett Jun 27 '13 at 06:10
  • 3
    If you look at the code for `Trampoline` in `scalaz` you'll see how you can avoid the complete structure re-traversal and how to avoid blowing your stack for non-trivial inputs. Sadly scalable/working monadic code in scala comes at a real cost. Trampolining can be 50x slower, so you have to choose between 3 options in scala: fast and loose while crashing on non-trivial inputs, slow but correct with frames on the heap, or littering your code with side-effects and non-local reasoning. – Edward Kmett Jun 27 '13 at 06:13