74

There seems to be a consensus that you should use Parsec as an applicative rather than a monad. What are the benefits of applicative parsing over monadic parsing?

  • style
  • performance
  • abstraction

Is monadic parsing out?

gawi
  • 13,940
  • 7
  • 42
  • 78

5 Answers5

99

The main difference between monadic and applicative parsing is in how sequential composition is handled. In the case of an applicative parser, we use (<*>), whereas with a monad we use (>>=).

(<*>) :: Parser (a -> b) -> Parser a -> Parser b
(>>=) :: Parser a -> (a -> Parser b) -> Parser b

The monadic approach is more flexible, because it allows the grammar of the second part to depend on the result from the first one, but we rarely need this extra flexibility in practice.

You might think that having some extra flexibility can't hurt, but in reality it can. It prevents us from doing useful static analysis on a parser without running it. For example, let's say we want to know whether a parser can match the empty string or not, and what the possible first characters can be in a match. We want functions

empty :: Parser a -> Bool
first :: Parser a -> Set Char

With an applicative parser, we can easily answer this question. (I'm cheating a little here. Imagine we have a data constructors corresponding to (<*>) and (>>=) in our candidate parser "languages").

empty (f <*> x) = empty f && empty x
first (f <*> x) | empty f   = first f `union` first x
                | otherwise = first f

However, with a monadic parser we don't know what the grammar of the second part is without knowing the input.

empty (x >>= f) = empty x && empty (f ???)
first (x >>= f) | empty x   = first x `union` first (f ???)
                | otherwise = first x

By allowing more, we're able to reason less. This is similar to the choice between dynamic and static type systems.

But what is the point of this? What might we use this extra static knowledge for? Well, we can for example use it to avoid backtracking in LL(1) parsing by comparing the next character to the first set of each alternative. We can also determine statically whether this would be ambiguous by checking if the first sets of two alternatives overlap.

Another example is that it can be used for error recovery, as shown in the paper Deterministic, Error-Correcting Combinator Parsers by S. Doaitse Swierstra and Luc Duponcheel.

Usually, however, the choice between applicative and monadic parsing has already been made by the authors of the parsing library you're using. When a library such as Parsec exposes both interfaces, the choice of which one to use is purely a stylistic one. In some cases applicative code is easier to read than monadic code and sometimes it's the other way round.

hammar
  • 138,522
  • 17
  • 304
  • 385
  • 6
    Wait! I had thought the same until today, when it occurred to me that the `empty` test can be applied to monadic parsers as well. The reason is that we can get the value you named `???` by applying the parser `x` on the empty string. More generally, you can just feed the empty string into the parser and see what happens. Likewise, the set of first characters can be obtained at least in a functional form `first :: Parser a -> (Char -> Bool)`. Of course, converting the latter to `Set Char` would involve an inefficient enumeration of characters, that's where applicative functors have the edge. – Heinrich Apfelmus Apr 21 '12 at 08:12
  • 8
    @HeinrichApfelmus You cannot get answer to `empty` that way. Or you *can*, but it's like giving answer to [http://en.wikipedia.org/wiki/Halting_problem] with "lets run the program and see if it halts". – phadej Jul 17 '13 at 12:33
  • 3
    @hammar: what if we run `let x = pure f <*> y <*> x in empty x`. If `empty y` is `False`, then computation doesn't terminate... just to point out, that it's not that simple. – phadej Jul 17 '13 at 12:38
15

If a parser is purely applicative, it is possible to analyse its structure and "optimise" it before running it. If a parser is monadic, it's basically a Turing-complete program, and performing almost any interesting analysis of it is equivalent to solving the halting problem (i.e., impossible).

Oh, and yes, there's a stylistic difference too...

MathematicalOrchid
  • 61,854
  • 19
  • 123
  • 220
  • 1
    The difference between Applicative and Monad has nothing to do with Turing-completeness. In Haskell, the relative difficulty of optimizing Monad instances is only due to the historical mistake of exposing `(>>=)` alone in the type class, making it impossible for instances to provide more optimized implementations of operators like `ap`. The Applicative class avoids this mistake, and exposes `<*>` (the equivalent of `ap`). – Pi Delport Aug 03 '14 at 12:27
  • @PietDelport, I think the trouble is that the underlying representations of monadic parsers generally aren't amenable to optimized `Applicative` instances, while applicative parsers generally don't support `>>=`. – dfeuer Jul 15 '15 at 22:14
  • 4
    @PiDelport It has absolutely everything to do with Turing-completeness. `>>=` takes in the result of the parser on the left and passes it to the one on the right, thus making the resulting parser impossible to statically analyze, as now the parser itself is turing complete. `<*>` and `<$>` do not inspect the argument, so while the output of the parser itself is turing complete, as you can put anything to the left of the `<$>`, the parsing aspect itself restricted and statically analyzable. – semicolon Oct 03 '16 at 18:29
13

The main reason I can see to prefer applicative parsers over monadic parsers is the same as the main reason to prefer applicative code over monadic code in any context: being less powerful, applicatives are simpler to use.

This is an instance of a more general engineering dictum: use the simplest tool which gets the job done. Don't use a fork lift when a dolly will do. Don't use a table saw to cut out coupons. Don't write code in IO when it could be pure. Keep it simple.

But sometimes, you need the extra power of Monad. A sure sign of this is when you need to change the course of the computation based on what has been computed so far. In parsing terms, this means determining how to parse what comes next based on what has been parsed so far; in other words you can construct context-sensitive grammars this way.

Tom Crockett
  • 30,818
  • 8
  • 72
  • 90
  • 3
    No, "use the simplest tool" may seem like a good rule of thumb, but actually is not. E.g., we use computer for writing letters, however computer to a sheet of paper is something like a table saw compared to a pair of scissors. – Valentin Golev Dec 06 '12 at 16:29
  • I mean, there are always upsides and downsides for every choice, but mere simplicity is a bad basis for a choice. *Especially* when you're deciding whether to use Haskell. :D – Valentin Golev Dec 06 '12 at 16:31
  • 6
    Yes, you're right. It would be better to say something like, "the right tool is the one which is maximally efficient while being minimally complex." What's missing from my description is the part about efficiency: you want a tool which is sufficiently powerful not just to do the job, but to make the job as easy as possible. But at the same time you don't want a tool which has lots of bells and whistles not applicable to the task at hand, since these most likely increase the complexity of operating it to no benefit. – Tom Crockett Dec 06 '12 at 20:42
4

With Parsec the benefit of using Applicative is just style. Monad has the advantage that it is more powerful - you can implement context sensitive parsers.

Doaitse Swierstra's UU parsing is more efficient if used only applicatively.

stephen tetley
  • 4,465
  • 16
  • 18
  • ISTR that formally, because Haskell allows infinite grammars, monad does not actually increase the number of recognizable languages. – luqui Oct 22 '11 at 19:13
  • 4
    @luqui I'm curious about your comment. Here's a language. The alphabet is Haskell `String`s, and the language is the set of words in which all the letters are equal. This is dead easy as a monadic parser: `option [] (anyToken >>= many . exactToken)` (where here `anyToken` and `exactToken` aren't actually a part of the Parsec library, but probably ought to be; ask me if you're unsure of what they do). How would the corresponding applicative parser look? – Daniel Wagner Oct 22 '11 at 21:15
  • 2
    @stephen, can you give a reference for context sensitive parsers? I'm curious what is the exact power of monadic and applicative parsers. – sdcvvc Oct 22 '11 at 21:47
  • 1
    @sdcvvc: [this paper](http://research.microsoft.com/en-us/um/people/daan/download/papers/parsec-paper.pdf) discusses the relative power of arrow parsers vs. monadic parsers and points out that monadic parsers enable parsing context-sensitive grammars while arrows do not. I believe applicative parsers would be strictly less powerful than arrow parsers. – Tom Crockett Oct 22 '11 at 23:06
  • @sdcwc - quoting from Daan Leijen's Parsec manual: "Examples of context sensitive parsing are XML tags, variable definition before use and environments in TEX." See page 25. – stephen tetley Oct 23 '11 at 11:26
  • 3
    This post (and the comments) explains that applicative parsers allow parsing all recursively enumerable languages: http://byorgey.wordpress.com/2012/01/05/parsing-context-sensitive-languages-with-applicative/ – Blaisorblade Dec 17 '12 at 03:04
  • @DanielWagner, to respond a mere five years late: you can lazily enumerate all strings that consist of just one character and apply `asum`. If the parser's `<|>` is sufficiently lazy, I believe this will get you the grammar you seek, very inefficiently. Of course, not all parsers have lazy `<|>`. `regex-applicative` certainly does not; I don't know about more traditional parser combinator libraries. – dfeuer Oct 03 '16 at 21:50
3

Monads are strictly a more featureful abstraction than Applicatives. You could write

instance (Monad m) => Applicative m where
  pure  = return
  (<*>) = ap

But there is no way to write

instance (Applicative a) => Monad a where
  return = pure
  (>>=) = ???

So yes, it is essentially a matter of style. I imagine if you use return and ap, then there should be no performance loss over using pure and <*>. Because Applicative is a strictly smaller interface than Monad, this means that <*> can sometimes be more highly optimized than ap. (But with clever GHC rewrite rules, one can often achieve the same optimizations regardless.)

Is monadic parsing out?

Since Monads are a subset of Applicatives, I would conclude that applicative parsing is a subset of monadic parsing.

Dan Burton
  • 53,238
  • 27
  • 117
  • 198
  • Did you mean to say that Monads are a _superset_ of Applicatives? – Guildenstern Nov 08 '13 at 17:10
  • 2
    @Guildenstern Monadic operations are a superset of Applicative operations. But said another way: types have an instance of `Monad` are a subset of types that have an instance of `Applicative`. When speaking of "Monads" and "Applicatives," one is usually referring to the types, not the operations. – Dan Burton Nov 08 '13 at 17:37
  • 2
    "I imagine if you use `return` and `ap`, then there should be no performance loss over using `pure` and `<*>`." IIRC that is not the case. There are plenty of situations (parsers being only one of them) where `<*>` outperforms `ap`. – semicolon Oct 03 '16 at 18:30
  • @semicolon you're absolutely right; I've updated my answer accordingly. – Dan Burton Oct 03 '16 at 18:51