0

I'm working on a parser that parses a few terms that have an optional initial character that is the same for all the terms. My problem is, some of the terms first character is the same as the initial character. I'm trying to avoid rewriting each term with and without the initial character. I figured a packrat parser sounded like the solution but I've yet to get it working the way I want. Below is a minimal example of what I'm trying to accomplish.

import scala.util.parsing.combinator._
import scala.util.parsing.input._

object test extends App {
  object MyParser extends RegexParsers with PackratParsers {
    lazy val p1:Parser[String] = "ab"
    lazy val p2:Parser[String] = "cd"
    lazy val parser:PackratParser[String] = ("a".?) ~> ( p2 | p1 )
  }

    def parsing[T](s: String)(implicit p: MyParser.Parser[T]): T = {
      import MyParser._
      val phraseParser = phrase(p)
      val input = new PackratReader(new CharSequenceReader(s))
      phraseParser(input) match {
        case Success(t, _) => t
        case NoSuccess(msg, _) => throw new IllegalArgumentException(
          "Could not parse '"+s+"': "+msg)
      }
    }                                 //> parsing: [T](s: String)(implicit p: test.MyParser.Parser[T])T

  implicit val parser = MyParser.log(MyParser.parser)("parser")
  println(parsing("cd"))  /* works */
  println(parsing("acd")) /* works */
  println(parsing("aab")) /* works */

  // Should work but doesn't.
  println(parsing("ab"))  /* expected "ab" but found "b" */
}
zarthross
  • 763
  • 6
  • 13

2 Answers2

2

After looking at @ishaaq's answer I came up with a solution that I believe is a little more elegant.

implicit class ParserExtension[T](val p: Parser[T]) extends AnyVal {
  def ?~[U](q: => Parser[U]): Parser[Option[T] ~ U] = {
    ((p ^^ Some.apply) ~ q) | (success(None) ~ q)
  }
}

This implicit class adds a method to all my parsers so I can have an optional parser with back tracking. You can use it like so:

val a:Parser[String] = "a"
val parser:Parser[Option[String] ~ String] = a ?~ ( p2 | p1 )

Here is a gist with the full source for my example

zarthross
  • 763
  • 6
  • 13
1

I believe the problem is that it isn't easy to get RegexParsers to backtrack and retry alternate paths in partially successful matches, i.e. the "a" in "ab" matches the optional "a".? part and then the "b" fails the subsequent match, but then after that failure, no further attempt is made to try matching with the optional prefix.

See this answer for more detail and a (non-trivial) solution.

Community
  • 1
  • 1
ishaaq
  • 6,329
  • 3
  • 16
  • 28
  • Thanks for the help. Reading the link you posted inspired me to dig a little deeper into how | and opt/? worked and I was able to find a more elegant solution. – zarthross Mar 26 '14 at 02:46
  • It looks like parser-combinators belong to the PEG class: [Whereas regular expression matchers may start by matching greedily, but will then backtrack and try shorter matches if they fail and CFG tries every possibility, PEG's `*`, `+`, and `?` operators always behave greedily, consuming as much input as possible and never backtracking: Expression `a*` will always consume as many a's as are consecutively available in the input string, causing `(a* a)` to fail always.](https://en.wikipedia.org/wiki/Parsing_expression_grammar#Operational_interpretation_of_parsing_expressions) – Little Alien Nov 02 '16 at 17:10