0

I've got this text

--make_123
Say:Hello

Say:Bye
--make_123--

I need to parse it like this into map:

{
  makeId : 123
  firstNode : Hello
  secondNode : Bye
}

I've got the following scala code:

class MyParser extends RegexParsers {

  def parseText(input: String): MapContent = parseAll(parseRequest, input) match {
    case Success(result, _) => result
    case NoSuccess(msg, _) => throw new SomeException(msg)
  }

  def parseRequest: Parser[MapContent] = parseMakeId ~ parseText ^^ {
    case makeId ~ firstNode => {
      MapContent(
        Map("makeId" -> makeId) ++
          Map("firstNode" -> firstNode))
    }
  }


  def parseMakeId: Parser[String] = "--make_" ~> ".*".r

  def parseText: Parser[String] = "Say:" ~> ".*".r

}

case class MapContent(map: Map[String, String])

Well, here i receive

string matching regex `.*\z$' expected but `S' found

which is the second line and the first literal of "Say:"

How to parse that text? How to omit an empty line on the 3rd line? Thx

perc
  • 41
  • 7
  • Please read http://stackoverflow.com/questions/1798738/scala-parser-token-delimiter-problem. I think changing to something like `def parseMakeId: Parser[String] = "--make_" ~> "\d+(?=[\r\n]+Say)".r` will help. – Wiktor Stribiżew Apr 29 '15 at 12:48
  • @stribizhev unfourtunately it doesn't help me – perc Apr 29 '15 at 13:01

1 Answers1

1

Here I assume, that the text you parse always contains exactly two Say nodes and that you don't want to check that the line --make_123-- is valid (i.e., has the same id as the first line).

You don't have to omit the empty line manually. It gets skipped automatically, because it's whitespace.

You are using parseAll, that means that your parser should match the whole text sent to it. So you have to add to your grammar the code to parse the second Say line and the closing line --make_123--.

That's simple to do with the parser functions you have already defined, and with a slight modification to the function producing MapContent from the parsed result.

The parseRequest function should be changed to have the following definition:

  def parseRequest: Parser[MapContent] = 
    parseMakeId ~ parseText ~ parseText <~ parseMakeId ^^ {
      case makeId ~ firstNode ~ secondNode =>
        MapContent(
          Map("makeId" -> makeId,
              "firstNode" -> firstNode,
              "secondNode" -> secondNode))
  }
Kolmar
  • 14,086
  • 1
  • 22
  • 25
  • thank you! but how to fix an exception with regexp exception above? – perc Apr 29 '15 at 13:07
  • @perc This exception means that, the parser expected to see no more lines after the first Say line, but found the second Say line (that's the reason for `'S'` in the exception message). The code in my answer should fix it. If not, please, inform me of that (I've tested the modified parser on the text from your question and it works for me). – Kolmar Apr 29 '15 at 13:09
  • thank you! it is working! BTW, i've got an example where "-make" repeats many times because it is used like delimiter of SAY messages. what to do in that case? – perc Apr 29 '15 at 13:19
  • @perc It's hard to answer exactly without seeing some examples, but for example if the input is a sequence of `make` groups (without nested `make`'s) you could try `rep` function http://www.scala-lang.org/api/2.11.6/scala-parser-combinators/index.html#scala.util.parsing.combinator.Parsers@rep[T]%28p:=%3EParsers.this.Parser[T]%29:Parsers.this.Parser[List[T]] (`def foo: Parser[List[MapContent]] = rep(parseRequest)`) But nesting requires significant changes to the grammar. – Kolmar Apr 29 '15 at 13:27