Scala Parser and combinators: java.lang.RuntimeException: string matching regex `\z' expected

Question

I am trying to parse some text following a grammer for Dynamic Epistemic Logic using Scala's RegexParser, as part of my Master Thesis. But I keep getting the same error on simple logical conjunctions. I understand where and why it's failing, but not why it's matching what it is in the first place.

My Code (severely boiled down to isolate the problem):

import scala.util.parsing.combinator._

class Formula() {
      def and(q:Formula) = Conjunction(this, q) // ∧
}

abstract class Literal extends Formula
abstract class Constant extends Formula

case class Atom(symbol:String) extends Literal 
case class NotAtom(p:Atom) extends Literal

case class Conjunction(p:Formula, q:Formula) extends Formula

class mapParser extends RegexParsers {
    val conjOp = "&"
    val negOp = "~"

    val listseparator = ","

    val leftparen = "("
    val rightparen = ")"

    def id:Parser[String] = "[a-z_]+".r // fluents are never capitalized. but may have underscore.
    def litargs: Parser[String] = repsep("[a-zA-Z]+".r,listseparator) ^^ {case list => "(" + list.toString.stripPrefix("List") + ")"}

    def atom: Parser[Atom] = id~leftparen~litargs~rightparen ^^ {case head~_~tail~_ => Atom(head+tail)}
    def negAtom: Parser[NotAtom] = negOp~>atom ^^ (NotAtom(_))
    def literal: Parser[Literal] = negAtom | atom 

    def and: Parser[Formula] = formula~conjOp~formula ^^ {case p1~_~p2 => Conjunction(p1,p2)}  

    def formula: Parser[Formula] = literal | and
};

object DomainParser extends mapParser {
  def test() =  {
    val domainDesc ="present(A) & ~present(B)";

    println("input: " + domainDesc)
    println("result: " + apply(domainDesc))
  }

  def apply(domainDesc: String) = parseAll(formula, domainDesc) match {
    case Success(result, _) => result
    case failure : NoSuccess => scala.sys.error(failure.msg)
  }
}

I am calling the DomainParser.test() function externally from java. The input is

present(A) & ~present(B)

which should yield:

Conjunction(Atom(present((A))),NotAtom(Atom(present((B)))))

but instead gives me the error:

Exception in thread "main" java.lang.RuntimeException: string matching regex `\z' expected but `&' found
    at scala.sys.package$.error(package.scala:27)
    at mAp.DomainParser$.apply(DEL.scala:48)
    at mAp.DomainParser$.test(DEL.scala:43)
    at mAp.DomainParser.test(DEL.scala)
at ma.MA.main(MA.java:8)

Furthermore, if I call the 'and' parser directly instead of the 'formula' parser, it works fine. Hence the problem seems to be with this line:

def formula: Parser[Formula] = literal | and

Because it attempts to parse the whole line as a single literal. It then parses present(A) correctly, but instead of failing on the '&' (not part of literal's parser) and returning to parse as an 'and'-term, it fails with the exception.

I cannot for the love of... see why it tries to match any '\z' at all. It is not included in the grammar by me, and even if it was - shouldn't it fail and try to parse as the next term instead of exiting with an exception? I am torn between thinking there is some in-built functionality for end-of-string terms that I do not know, to thinking there is something hugely obvious staring me in the face.

Any help would be sorely needed, very welcome and thank you very much in advance.

Dan True

How would an input with multiple '&' look like? `present(A) & ~present(B) & present(C)`? — Kigyo, Mar 06 '14 at 02:49
Yes. Which (if I have specified the BNF correctly) should hopefully match something like: Conjunction(Conjunction(Atom(present((A))),NotAtom(Atom(present((B)))), (Atom(present((C))))) — Dan True, Mar 06 '14 at 08:45
Well, I'm not an expert on regular expressions, but you can have a look at this post: http://stackoverflow.com/questions/1798738/scala-parser-token-delimiter-problem It seems that literal is tried first and succeeds, but then `\z` which stands for the end of input is expected but there is still the rest of the String which is `& ~present(B)` — Kigyo, Mar 06 '14 at 13:43
Kigyo: Yes, I found the same question afterwards. So I need to find a way to make the Literal regex fail instead of succeeding when it is followed by whitespaces and a delimiter. Thank you. I will see if I can handle it. — Dan True, Mar 06 '14 at 14:02
That solved it, but resulted in an stackoverflow due to left recursion. I decided to remove the infic notation and instead have "&(present(A), ~present(B)) which appears to work fine. Any tips as to how to achieve infix operator notation would be welcome. — Dan True, Mar 06 '14 at 22:28

Kigyo · Accepted Answer · 2014-03-07T20:31:52.933

I'll just add a similar Parser for propositional formulas I made. Maybe this might help you.

'+' = top/true

'-' = bottom/false

'!' = negation

'&' = conjunction

'|' = disjunction

'>' = implication

'<' = equivalence

object FormulaParser extends StandardTokenParsers with PackratParsers {
  //Symbols for all connectives
  private val parseSymbols = List("(", ")", "+", "-", "!", "&", "|", ">", "<")
  lexical.delimiters ++= parseSymbols

  private lazy val formula: PackratParser[Formula] = implication | equivalence | conjunction | disjunction | term
  private lazy val formulaWithoutBrackets: PackratParser[Formula] = implication | equivalence | conjunction | disjunction | termWithoutBrackets

  private lazy val term: PackratParser[Formula] = top | bottom | variable | parens | negation
  private lazy val termWithoutBrackets = top | bottom | variable | negation

  private lazy val top: PackratParser[Formula] = "+" ^^^ { Top() }
  private lazy val bottom: PackratParser[Formula] = "-" ^^^ { Bottom() }
  private lazy val variable: PackratParser[Formula] = ident ^^ { Variable(_) }
  private lazy val parens: PackratParser[Formula] = "(" ~> formulaWithoutBrackets <~ ")"
  private lazy val negation: PackratParser[Formula] = "!" ~> term ^^ { Negation(_) }

  private lazy val conjunction: PackratParser[Formula] = term ~ "&" ~ term ~ rep("&" ~> term) ^^ {
    case p ~ "&" ~ q ~ conj => conj.foldLeft(Conjunction(p,q))((con, elem) => Conjunction(con, elem))
  }

  private lazy val disjunction: PackratParser[Formula] = term ~ "|" ~ term ~ rep("|" ~> term) ^^ {
    case p ~ "|" ~ q ~ disj => disj.foldLeft(Disjunction(p,q))((dis, elem) => Disjunction(dis, elem))
  }

  private lazy val implication: PackratParser[Formula] = (conjunction | disjunction | term) ~ ">" ~ (conjunction | disjunction | term) ^^ { case p ~ ">" ~ q => Implication(p, q) }

  private lazy val equivalence: PackratParser[Formula] = (conjunction | disjunction | term) ~ "<" ~ (conjunction | disjunction | term) ^^ { case p ~ "<" ~ q => Equivalence(p, q) }
}

With this you can parse input like: (p & q) | (!q > (r & s))

Here Conjunction and Disjunction also bind stronger than Implication and Equivalence.

p & q > r | s will result in Implication(Conjunction(Variable(p), Variable(q)), Disjunction(Variable(r), Variable(s)))

Awesome, I Can use that for inspiration for sure :-) I am unsure as to why yours can handle infix operators when mine cannot - even when implemented as a packratparser. I will look into the details of your code and see if I can figure it out. — Dan True, Mar 08 '14 at 18:49
Well, this parser has no left-recursion. I think it has nothing to do with the parser type. I just did it as a packrat-parser because it sped up the parsing alot, but I don't know much about it (in theory and whats it does differently) — Kigyo, Mar 09 '14 at 00:05

Kigyo · Answer 2 · 2014-03-07T17:56:41.260

0

Ok. If it's just due to the left-recursion I have a similar parser, where I resolved that.

You have to change the following:

def and: Parser[Formula] = literal~conjOp~literal~rep(conjOp ~> literal) ^^ {
  case p ~ conjOp ~ q ~ conj => conj.foldLeft(Conjunction(p,q))(Conjunction(_, _))
}
def formula: Parser[Formula] = and | literal

Since in the end there are only Literals connected via Conjunction, you can rewrite and like that.

Slightly more complex example with your code:

input: p(A,B) & ~p(B) & p(C)
result: Conjunction(Conjunction(Atom(p(A,B)),NotAtom(Atom(p(B)))),Atom(p(C)))

edited Mar 07 '14 at 17:56

answered Mar 07 '14 at 17:50

Kigyo

5,668
1
20
24

Yeah, that would work for this simplificaton. But this was just a boiled-down version to isolate the error. In the real code I have all other propositional constructions (&, |, ->, <-, <->, <+>, true, false), plus a set of Belief, EveryoneKnows and CommonKnowledge operators all of which can be recursively included. So far I have decided to simply go with prefix operators. It works like a charm. – Dan True Mar 07 '14 at 18:48
I also have a parser for all propositional Connectives, which works with infix, but not with "Belief, EveryoneKnows and CommonKnowledge". I think it would not be hard to adapt it. If you want I can give it to you. – Kigyo Mar 07 '14 at 18:50

Scala Parser and combinators: java.lang.RuntimeException: string matching regex `\z' expected

2 Answers2