2

From reading this answer, I understand that the inclusion of empty in Alternative was largely a design decision to make Alternative a monoid (and thus more powerful). It seems to me this was also because otherwise you couldn't express any laws for Alternative.

But this is a pain if I have a generic applicative parser like so:

newtype Parser err src target = Parser (ExceptT err (State [src]) target)
  deriving (Functor, Applicative, Alternative, Monad, MonadState [src], MonadError err)

Clearly, we can get the same behavior as <|> and many/some from Control.Applicative with:

option :: Parser e s t -> Parser e s t -> Parser e s t
option parserA parserB = do
  state <- get
  parserA `catchError` \_ -> put state >> parserB

many :: Parser e s t -> Parser e s [t]
many parser = some parser `option` return []

some :: Parser e s t -> Parser e s [t]
some parser = (:) <$> parser <*> many parser

Even though none of these use empty, it seems like I am forced to re-implement them instead of deriving Alternative, because I can't conceive a generic way to instance empty for it (of course, I'd still need to instance <|> to get the preservation of state on parserA error, but then I could get some, many, optional and friends for free).

Digging into Parsec's source, it seems that it subverts this because it doesn't allow for custom error types (or at least, Parsec isn't parameterized by a custom error type):

instance Applicative.Alternative (ParsecT s u m) where
    empty = mzero
    (<|>) = mplus

instance MonadPlus (ParsecT s u m) where
    mzero = parserZero
    mplus p1 p2 = parserPlus p1 p2

-- | @parserZero@ always fails without consuming any input. @parserZero@ is defined
-- equal to the 'mzero' member of the 'MonadPlus' class and to the 'Control.Applicative.empty' member
-- of the 'Control.Applicative.Alternative' class.

parserZero :: ParsecT s u m a
parserZero
    = ParsecT $ \s _ _ _ eerr ->
      eerr $ unknownError s

unknownError :: State s u -> ParseError
unknownError state        = newErrorUnknown (statePos state)

newErrorUnknown :: SourcePos -> ParseError
newErrorUnknown pos
    = ParseError pos []

Taking inspiration from this, it seems like the only reasonable way around this is to have my generic parser wrap the user error type with something like:

 data ParserError err = UserError err | UnknownError

 newtype Parser err src target = Parser (ExceptT (ParserError err) (State [src]) target)
  deriving (Functor, Applicative, Alternative, Monad, MonadState [src], MonadError err)

And then empty can be:

empty = throwError UnknownError

This just feels wrong, though. This wrapping only exists to appease the requirement for an empty and it makes consumers of this generic parser do more work to handle errors (they also must handle UnknownError now and unwrap their custom errors). Is there any way to avoid this?

duplode
  • 33,731
  • 7
  • 79
  • 150
Bailey Parker
  • 15,599
  • 5
  • 53
  • 91
  • Maybe I'm not understanding the problem correctly; Why not simply require there to be a `Default` instance for the error or something along those lines? No need add wrapping/unwrapping (unless you want to use it with an error type that doesn't have a default, in which case you'd just do *both* the wrapping and the unwrapping on the use end) – Cubic Feb 28 '18 at 15:01
  • I considered requiring a default. Decided against it because I wasn't sure I had the chops to make such a monad (I think I know how to though). Short of that, the entire thing I was trying to avoid was the unwrapping on the user end (presumably my library would need to wrap in its own error type). Although given that Megaparsec and Parsec just define their own, so it seems like this is the way to go. – Bailey Parker Mar 02 '18 at 21:48

1 Answers1

4

It's a problem of standard base hierarchy. It's not extremely modular. In some perfect world Alternative would be splitted into three type classes if we want to achieve maximum modularity.

See Alternative definition in PureScript world:

Like this:

class Functor f <= Alt f where
  alt :: forall a. f a -> f a -> f a  -- alt is (<|>)

class Alt f <= Plus f where
  empty :: forall a. f a

class (Applicative f, Plus f) <= Alternative f

So, if type hierarchy is modular enough you could implement Alt for your Parser type (and have all <|>-only-related functions) but not Alternative. If Alternative is Monoid then Alt is Semigroup: you can append elements but you don't have empty element.

Note, that previously in GHC base package Applicative wasn't a superclass of Monad, Semigroup wasn't in base and only in GHC-8.4.1 Semigroup will be superclass of Monoid. So you can expect that in some future similar thing (modularization) can happen with Alternative.

Shersh
  • 9,019
  • 3
  • 33
  • 61
  • 1
    http://hackage.haskell.org/package/semigroupoids-5.2.2/docs/Data-Functor-Alt.html – phadej Feb 26 '18 at 10:12
  • Very interesting to see that in other worlds the more modular choice was made! Thanks for the detail! Also interesting that in lieu of an `empty` the laws are just for associativity and distributivity. I guess we can only wait for this modularization to mature over time. As a tangent, I'm curious what you think about the design choice of having a wrapper error type to `instance Alternative`. Parsec as reference art didn't suggest to me whether this was a good or bad approach. Perhaps you have more experience? – Bailey Parker Feb 27 '18 at 11:29
  • @BaileyParker Well, `parsec` is very old and you should use `megaparsec` (at least for reference): https://github.com/mrkkrp/megaparsec#megaparsec-vs-parsec As I can see from `megaparsec` sources, it also defines `empty` using special error type: https://hackage.haskell.org/package/megaparsec-6.4.0/docs/src/Text-Megaparsec.html#line-441 I think this is a necessary workaround for current type class hierarchy. – Shersh Feb 27 '18 at 14:17
  • Thanks for pointing me to megaparsec! It looks like fantastic reference art. And thanks for your answer :) – Bailey Parker Feb 27 '18 at 20:36