2

Warning; while I tried to shorten the code down, to a minimum. I still had to include quite a bit, to ensure that the required information was present.

This code, compiles files, and runs resulting in a syntax error;

name = simple_name      [ qi::_val = qi::_1 ]
     | qualified_name   [ qi::_val = qi::_1 ]
     ;

While this;

name = qualified_name   [ qi::_val = qi::_1 ]
     | simple_name      [ qi::_val = qi::_1 ]
     ;

Results in a SIGSEGV, Segmentation fault;

boost::detail::function::function_obj_invoker4<boost::spirit::qi::detail::parser_binder<boost::spirit::qi::alternative<boost::fusion::cons<boost::spirit::qi::action<boost::spirit::qi::reference<boost::spirit::qi::rule<boost::spirit::lex::lexertl::iterator<boost::spirit::lex::lexertl::functor<boost::spirit::lex::lexertl::token<__gnu_cxx::__normal_iterator<char*, std::string>, boost::mpl::vector<std::string, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, mpl_::bool_<false>, unsigned long>, boost::spirit::lex::lexertl::detail::data, __gnu_cxx::__normal_iterator<char*, std::string>, mpl_::bool_<true>, mpl_::bool_<false> > >, Ast::name* (), boost::spirit::unused_type, boost::spirit::unused_type, boost::spirit::unused_type> const>, boost::phoenix::actor<boost::proto::exprns_::expr<boost::proto::tagns_::tag::assign, boost::proto::argsns_::list2<boost::proto::exprns_::expr<boost::proto::tagns_::tag::terminal, boost::proto::argsns_::term<boost::spirit::attribute<0> >,0l>,boost::phoenix::actor<boost::spirit::argument<0> > >, 2l> > >,boost::fusion::cons<boost::spirit::qi::action<boost::spirit::qi::reference<boost::spirit::qi::rule<boost::spirit::lex::lexertl::iterator<boost::spirit::lex::lexertl::functor<boost::spirit::lex::lexertl::token<__gnu_cxx::__normal_iterator<char*, std::string>,boost::mpl::vector<std::string, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, mpl_::bool_<false>,unsigned long>, boost::spirit::lex::lexertl::detail::data, __gnu_cxx::__normal_iterator<char*,std::string>, mpl_::bool_<true>, mpl_::bool_<false> > >, Ast::name* (), ... more to come ...

Where;

simple_name = (tok.identifier) [ qi::_val = build_simple_name_(qi::_1) ];

And;

qualified_name = (name >> qi::raw_token(DOT) >> tok.identifier) [ qi::_val = build_qualified_name_(qi::_1, qi::_2) ] ;

All of these rules, return a Ast::name*();

qi::rule<Iterator, Ast::name*()> name;
qi::rule<Iterator, Ast::name*()> simple_name;
qi::rule<Iterator, Ast::name*()> qualified_name;

The helper functions are defined as;

Ast::name* build_simple_name(std::string str)
{
    return (new Ast::name_simple(Ast::identifier(str)));
}
BOOST_PHOENIX_ADAPT_FUNCTION(Ast::name*, build_simple_name_, build_simple_name, 1)

And;

Ast::name* build_qualified_name(Ast::name* name, std::string str)
{
    std::list<Ast::identifier> qualified_name = Ast::name_to_identifier_list(name);
    qualified_name.push_back(Ast::identifier(str));

    return (new Ast::name_qualified(qualified_name));
}
BOOST_PHOENIX_ADAPT_FUNCTION(Ast::name*, build_qualified_name_, build_qualified_name, 2)

The lexer definitions used are defined as;

lex::token_def<std::string> identifier = "{JAVA_LETTER}{JAVA_LETTER_OR_DIGIT}*";

And;

('.', DOT)

Where the patterns {JAVA_LETTER} and {JAVA_LETTER_OR_DIGIT} are defined as;

("DIGIT",           "[0-9]")
("LATIN1_LETTER",   "[A-Z]|[a-z]")
("JAVA_LETTER",     "{LATIN1_LETTER}|$|_")
("JAVA_LETTER_OR_DIGIT", "{JAVA_LETTER}|{DIGIT}")

My input, is a simple string;

package a.D;

Which lexes to the tokens;

Keywords : package
Identifier : a
Delimiters : .
Identifier : D
Delimiters : ;

Where the first example (with simple_name first), throws a syntax error as;

Syntax Error at line 1:
package a.D;
          ^^

And the last example simply throws an segfault, with the error posted previously.

Clearly the second example is what I want, as it should try to match the complex expression, before the simple one.

Does anyone see why the code crashes, or how I would go about figuring out? - Also should this be at code review?

sehe
  • 374,641
  • 47
  • 450
  • 633
Skeen
  • 4,614
  • 5
  • 41
  • 67
  • You have left recursion, name depends on qualified_name which depends on name and so on. – llonesmiz Sep 04 '13 at 11:29
  • Is this illegal? - Also if that's the case, how can I obtain the same effect, that is how can I refactor it? – Skeen Sep 04 '13 at 11:47

1 Answers1

2

The problem is that you have a left recursive grammar and that cannot be used with Boost.Spirit. What you have is basically:

name = identifier | name >> dot >> identifier;

As you can see here, in order to remove the left recursion when you have something like:

A = A >> alpha | beta;

You need to create 2 new "rules":

A = beta >> A_tail;
A_tail = eps | alpha >> A_tail;

In your case:

A := name
alpha := dot >> identifier
beta := identifier

So your "rules" would be:

name = identifier >> name_tail;
name_tail = eps | dot >> identifier >> A_tail;

If you look closely at name_tail you can see that it literally means: either nothing or dot >> identifier followed by either nothing or dot >> identifier and so on. That means that name_tail is:

name_tail = *(dot >> identifier);

So finally your name rule would be:

name = identifier >> *(dot >> identifier);

All of this is correct, but there is a very good chance that it will not work with your attributes.

llonesmiz
  • 155
  • 2
  • 11
  • 20
  • Uhm, I have a question actually, as it seems there's a deterministic way of removing left-recursion, is there a specific reason why boost::spirit, doesn't 'simply' preprocess the grammar, in order to eliminate it. - Is it impossible, unfeasible or simply a feature that no one has implemented? – Skeen Sep 04 '13 at 12:32
  • No idea, I have actually tried to implement exactly that, without success. I think that it would be something really cool. Have you been able to change your semantic actions to adapt to this, or should I try to do something about it? – llonesmiz Sep 04 '13 at 12:58
  • @Skeen it can't because you can have custom parsers (not to mention semantic actions) that violate the associativity laws that this would assume. Of course, it's something that "they" could allow for using traits, but it's far easier to let the human do the thinking :/ – sehe Sep 04 '13 at 13:07
  • As a rule, I don't think there are any expression transformations (except, possibly, implicitly the subtraction of `char_` classes) in Spirit Qi – sehe Sep 04 '13 at 13:08
  • @cv_and_he: I've successfully changed my semantic actions to adapt this, and it works rather well! One question however, can I inside my builder function, cause a parse failure somehow? (Currently I just hacked it together with an `assert(false)`, but this isn't a pretty solution to me). – Skeen Sep 04 '13 at 13:24
  • @sehe: It makes sense that one cannot do it, whenever you start doing something that is outside of the context-free-grammar realm. Just as my code wouldn't be able to have been auto re-factored, as that would also require changing my 'evil' semantic actions. – Skeen Sep 04 '13 at 13:26
  • On the opposite end of the spectrum, [generating static lexer tables](http://www.boost.org/doc/libs/1_54_0/libs/spirit/doc/html/spirit/lex/abstracts/lexer_static_model.html) is (usually) purely deterministic, which is why it is commonplace to do this (CoCo/R, ANTLR, flex/bison, lex/yacc etc. including Spirit Lex) – sehe Sep 04 '13 at 13:36
  • @Skeen I'm sure you can, I'm not sure that this way would work: You'll need to add a `bool& pass` parameter to your builder function and then you would invoke it like this `qi::_val = build_simple_name_(qi::_1, qi::_pass)` (or maybe using `phx::ref(qi::_pass)`). You would need to set that `pass` variable to false when you want your parsing to fail. – llonesmiz Sep 04 '13 at 13:37
  • @cv_and_he: And then assign it `false`, for the parsing to fail, right? – Skeen Sep 04 '13 at 13:39
  • Oh aha, I missed that question. Yes `[_pass = false]` (like `qi::eps(false)`) will cause a _non-match_. Also, _function-pointer SA_ s by default take a `bool& pass` parameter. – sehe Sep 04 '13 at 13:40
  • @cv_and_he: Ah lovely, I believe that I've actually got around to reading that at-least in the documentation. – Skeen Sep 04 '13 at 13:40
  • @sehe: Function pointer Semantic Actions, are those what I'm using? - or is that what you get, when you just put a function pointer inside the 'semantic action' braces? – Skeen Sep 04 '13 at 13:47
  • 1
    @Skeen He meant the latter, like [here](http://stackoverflow.com/questions/3066701/boost-spirit-semantic-action-parameters/3067881#3067881). – llonesmiz Sep 04 '13 at 13:49