8

Documentation for the parsec package states that u argument is used to carry some user state through monadic computation. But the same functionality can be achieved by basing ParsecT monad transformer on State monad. So if my parser is not stateful, i don't need u altogether, but have to set it to () with parsec. What's rationale for adding non-optional state support to ParsecT?

arrowd
  • 33,231
  • 8
  • 79
  • 110
  • What do you mean non-optional? How would it be if it were optional? – Shoe Jan 08 '15 at 08:16
  • It wouldn't be at all and the `ParsecT` type would have only 3 arguments - `s m a`. I meant that if user wants state, he can base on `State`. – arrowd Jan 08 '15 at 08:25
  • You can just use a type alias: `type YourParsec s = Parsec s ()`, no? – Shoe Jan 08 '15 at 08:44
  • 4
    I suppose it's for optimisation purposes: they presumably found out that an explicit stack with `State` slows down parsing, whereas an unused `()` built-in state has almost no effect on performance. – leftaroundabout Jan 08 '15 at 08:52
  • 1
    That’s an assumption (and a good one), but do we know for sure? – Joachim Breitner Jan 08 '15 at 12:22
  • My wild guess: originally, only the `Parsec s u` monad was designed, which included the state type `u`. Then, the transformer `ParsecT` was designed so to allow `Parsec s u a = ParsecT s u Identity a`. Now, the `u` was no longer needed, but the design stuck with it instead of having `Parsec s u a = ParsecT s (State u) a` and one fewer parameter to `ParsecT`. – chi Jan 08 '15 at 13:16
  • 1
    You may want to look at http://stackoverflow.com/a/24981676/570682 - the user state is reset when trying alternatives (as with `(<|>)`) but an underlying `MonadState m` would not be. – Lambdageek Jan 08 '15 at 14:05

2 Answers2

5

Because a parser of type ParsecT s () (State st) a behaves differently from a parser of type Parsec s st Identity a when it comes to backtracking:

  • User state resets when parsec tries an alternative after a failing parse that consumes no input.
  • But the underlying Monad m does not backtrack; all the effects that happened on the way to a final parse result are kept.

Consider the following example:

{-# LANGUAGE FlexibleContexts #-}
module Foo where

import Control.Applicative
import Control.Monad.State
import Text.Parsec.Prim hiding ((<|>), State(..))
import Text.Parsec.Error (ParseError)

tick :: MonadState Int m => ParsecT s Int m ()
tick = do
  lift $ modify (+1)
  modifyState (+1)

tickTock :: MonadState Int m => ParsecT s Int m ()
tickTock = (tick >> empty) <|> tick

-- | run a parser that has both user state and an underlying state monad.
--
-- Example:
-- >>> run tickTock
-- (Right 1,2)
run :: ParsecT String Int (State Int) () -> (Either ParseError Int, Int)
run m = runState (runParserT (m >> getState) initUserState "-" "") initStateState
  where initUserState = 0
        initStateState = 0

As you can see, the underlying state monad registered two ticks (from both alternatives that were tried), while the user state of the Parsec monad transformer only kept the successful one.

Lambdageek
  • 12,465
  • 1
  • 25
  • 33
1

ParsecT carries it's own state already: parsing position and input: http://haddocks.fpcomplete.com/fp/7.8/20140916-162/parsec/Text-Parsec-Prim.html#t:State

So as leftaroundabout pointed out, it's probably due optimisation purposes.

Community
  • 1
  • 1
phadej
  • 11,947
  • 41
  • 78