2

I am having some issues specifying and parsing a rather simple grammar.

vertex = char+
edge = vertex " -> " vertex
start = ((vertex | edge) eol)*

input = "a\nb\na -> b\n"

Spirit is doing the following:

"a" -> vertex
"\n" -> eol
-> start

"b" -> vertex
"\n" -> eol
-> start

and

"a" -> vertex
terminate

instead of identifying the edge in the end and parsing the entire input. That is, it could parse the entire input but it's not. Shouldn't it backtrack and attempt to parse using the alternate rule? And thus complete the start rule.

Is it because Spirit uses PEGs? (http://www.boost.org/doc/libs/1_57_0/libs/spirit/doc/html/spirit/abstracts/parsing_expression_grammar.html#spirit.abstracts.parsing_expression_grammar.alternatives)

Minimal working example:

#include <cstdio>
#include <string>
#include <boost/spirit/include/qi.hpp>

namespace qi = boost::spirit::qi;

void print(const std::string & str) {
  printf("%s\n", str.c_str());
}

void print_eol() {
  printf("eol\n");
}

int main() {
  std::string str = "a\nb\na -> b\n";
  std::string::iterator it = str.begin(), begin = str.begin(), end = str.end();

  qi::rule<std::string::iterator, std::string()> vertex = +qi::alnum;
  qi::rule<std::string::iterator, std::string()> edge = vertex >> " -> " >> vertex;
  qi::rule<std::string::iterator> start = (vertex[&print] | edge[&print]) % qi::eol[&print_eol];

  bool matched = parse(it, end, start);

  if (matched) {
    printf("matched\n");
  }
  if (it == end) {
    printf("all\n");
  } else if (it != begin) {
    printf("some\n");
  } else {
    printf("none\n");
  }

  return 0;
}

Output:

$ ./a.out
a
eol
b
eol
a
matched
some

I'm using Boost 1.57.0 and Clang 3.5.1 on MSYS2.

1 Answers1

1

It seems that you answered your own question: yes, it's because PEG grammars are inherently greedy, left-to-right matching.

Notes

  • printing from a semantic action is not a fair test, because even when the parser does backtrack (it can continue the next alternative branch on failure of a single expression) the side effect will already have fired. This is a major draw back of using Semantic Actions - unless you're careful (Boost Spirit: "Semantic actions are evil"?)

  • BOOST_SPIRIT_DEBUG and BOOST_SPIRIT_DEBUG_NODES macros are there fopr this purpose and give a lot more information

  • The obvious solutions here would be

    1. reorder the branches:

      start = (edge[&print] | vertex[&print]) % qi::eol[&print_eol];
      

      Live On Coliru

      a
      eol
      b
      eol
      a -> b
      matched all
      
    2. left - factorisation:

      start = (vertex[print] >> -(" -> " >> vertex)[print]) % qi::eol[&print_eol];
      

      Live On Coliru

      a
      eol
      b
      eol
      a
      b
      matched all
      

If you want some real-life ideas beyond satisfying your curiosity about PEG grammars, you can search SO answers for ideas how to parse into separate collections of vertices/edges in roughly a dozen answers on SO, as I remember.

Community
  • 1
  • 1
sehe
  • 374,641
  • 47
  • 450
  • 633
  • Thanks for the answer. I did found some questions about graph parsing, but none answered my question in detail. What I don't understand is why the parser did not backtrack. Greedy makes it parse the vertex rule first, but then it fails to parse the rest of the input and thus it is unable to complete the start rule, which is the one it is trying to complete. If he backtracked, it could parse the input. The parser is expanding the start rule 2 times when it could expand it 3. (I am not used to PEGs.) I thought it would look at the alternates when the rule expansion failed. –  Mar 11 '15 at 13:17
  • @joca.bt It does. But in your sample `vertex[&print]` matches, so the first branch is successful. The second branch is never tried. You needed to re-order the branches to begin with, so the first branch can fail, and the second - which is a subset of it - one can be used as a fallback.is a subset – sehe Mar 11 '15 at 19:20
  • The first branch is successful but the start rule is not expanded completely while it could (it's missing the eol to be fully expanded). –  Mar 11 '15 at 19:32
  • 1
    That's PEG grammar for you indeed. This is what I mean with greedy. The alternative parser already succeeded, and will therefore no longer backtrack (likewise, Kleen Star or Plus will not backtrack after consuming matches) – sehe Mar 11 '15 at 19:32