You're running into trouble because expression templates keep internal references to temporaries.
Simply aggregate the sub-parser instances:
template <typename Iter=std::string::iterator>
struct value_path : grammar<Iter, boost::tuple<std::vector<std::string>, std::string>()> {
value_path() : value_path::base_type(start)
{
start = -(module_path_ >> '.') >> value_name_;
}
private:
rule<Iter, boost::tuple<std::vector<std::string>, std::string>()> start;
module_path<Iter> module_path_;
value_name<Iter> value_name_;
};
Notes I feel it might be a design smell to use separate sub-grammars for such small items. Although grammar decomposition is frequently a good idea to keep build times manageable and code size somewhat lower, but it seems - from the description here - you might be overdoing things.
The "plastering" of parser expressions behind a qi::rule
(effectively type erasure) comes with a possibly significant runtime overhead. If you subsequently instantiate those for more than a single iterator type, you may be compounding this with unnecessary growth of the binary.
UPDATE Regarding the idiomatic way to compose your grammars in Spirit, here's my take:
Live On Coliru
using namespace ascii;
using qi::raw;
lowercase_ident = raw[ (lower | '_') >> *(alnum | '_' | '\'') ];
module_path_item = raw[ upper >> *(alnum | '_' | '\'') ];
module_path_ = module_path_item % '.';
auto special_char = boost::proto::deep_copy(char_("-+!$%&*./:<=>?@^|~"));
operator_name = qi::raw [
('!' >> *special_char) /* branch 1 */
| (char_("~?") >> +special_char) /* branch 2 */
| (!char_(".:") >> special_char >> *special_char) /* branch 3 */
| "mod" /* branch 4 */
| "lor" | "lsl" | "lsr" | "asr" | "or" /* branch 5-9 */
| "-." /* branch 10 doesn't match because of branch 3 */
| "!=" | "||" | "&&" | ":=" /* branch 11-14 doesn't match because of branch 1,3 */
// | (special_char - char_("!$%./:?@^|~")) /* "*+=<>&-" cannot match because of branch 3 */
]
;
value_name_ =
lowercase_ident
| '(' >> operator_name >> ')'
;
start = -(module_path_ >> '.') >> value_name_;
Where the rules are fields declared as:
qi::rule<Iter, ast::value_path(), Skipper> start;
qi::rule<Iter, ast::module_path(), Skipper> module_path_;
// lexeme: (no skipper)
qi::rule<Iter, std::string()> value_name_, module_path_item, lowercase_ident, operator_name;
Notes:
- I've added a skipper, because since your
value_path
grammar didn't use one, any skipper you passed into qi::phrase_parse
was being ignored
- The lexemes just drop the skipper from the rule declaration type, so you don't even need to specify
qi::lexeme[]
- In the lexemes, I copied your intention to just copy the parsed text verbatim using
qi::raw
. This allows us to write grammars more succinctly (using '!'
instead of char_('!')
, "mod"
instead of qi::string("mod")
). Note that bare literals are implicitly transformed into "non-capturing" qi::lit(...)
nodes in the context of a Qi parser expression, but since we used raw[]
anyways, the fact that lit
doesn't capture an attribute is not a problem.
I think this results in a perfectly cromulent grammar definition that should satisfy your criteria for "high-level". There's some wtf-y-ness with the grammar itself (regardless of its expression any parser generator language, likely):
- I've simplified the
operator_name
rule by removing nesting of alternative branches that will result in the same effect as the simplified flat alternative list
- I've refactored the "magic" lists of special characters into
special_chars
In alternative branch 3, e.g., I've noted the exceptions with a negative assertion:
(!char_(".:") >> special_char >> *special_char) /* branch 3 */
The !char_(".:")
assertion says: when the input wouldn't match '.'
or ':'
continue matching (any sequence of special characters). In fact you could equivalently write this as:
((special_char - '.' - ':') >> *special_char) /* branch 3 */
or even, as I ended up writing it:
(!char_(".:") >> +special_char) /* branch 3 */
The simplification of the branches actually raises the level of abstraction! It becomes clear now, that some of the branches will never match, because earlier branches match the input by definition:
| "-." /* branch 10 doesn't match because of branch 3 */
| "!=" | "||" | "&&" | ":=" /* branch 11-14 doesn't match because of branch 1,3 */
// | (special_char - char_("!$%./:?@^|~")) /* "*+=<>&-" cannot match because of branch 3 */
I hope you can see why I qualify this part of the grammar as "a little bit wtf-y" :) I'll assume for now that you got confused or something went wrong when you reduces it to a single rules (your "fool's errand").
Some further improvements to be noted:
- I've added a proper AST struct instead of the
boost::tuple<>
to make the code more legible
- I've added BOOST_SPIRIT_DEBUG* macros so you can debug your grammar at a high level (the rule level)
- I've ditched the blanket
using namespace
. This is generally a bad idea. And with Spirit it is frequently a very bad idea (it can lead to ambiguities that are unsolvable, or to very hard to spot errors). As you can see, it doesn't necessarily lead to very verbose code.
Full Listing
#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/adapted.hpp>
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
namespace ast {
using module_path = std::vector<std::string>;
struct value_path {
module_path module;
std::string value_expr;
};
}
BOOST_FUSION_ADAPT_STRUCT(ast::value_path, (ast::module_path, module)(std::string,value_expr))
template <typename Iter, typename Skipper = ascii::space_type>
struct value_path : qi::grammar<Iter, ast::value_path(), Skipper> {
value_path() : value_path::base_type(start)
{
using namespace ascii;
using qi::raw;
lowercase_ident = raw[ (lower | '_') >> *(alnum | '_' | '\'') ];
module_path_item = raw[ upper >> *(alnum | '_' | '\'') ];
module_path_ = module_path_item % '.';
auto special_char = boost::proto::deep_copy(char_("-+!$%&*./:<=>?@^|~"));
operator_name = qi::raw [
('!' >> *special_char) /* branch 1 */
| (char_("~?") >> +special_char) /* branch 2 */
| (!char_(".:") >> +special_char) /* branch 3 */
| "mod" /* branch 4 */
| "lor" | "lsl" | "lsr" | "asr" | "or" /* branch 5-9 */
| "-." /* branch 10 doesn't match because of branch 3 */
| "!=" | "||" | "&&" | ":=" /* branch 11-14 doesn't match because of branch 1,3 */
// | (special_char - char_("!$%./:?@^|~")) /* "*+=<>&-" cannot match because of branch 3 */
]
;
value_name_ =
lowercase_ident
| '(' >> operator_name >> ')'
;
start = -(module_path_ >> '.') >> value_name_;
BOOST_SPIRIT_DEBUG_NODES((start)(module_path_)(value_name_)(module_path_item)(lowercase_ident)(operator_name))
}
private:
qi::rule<Iter, ast::value_path(), Skipper> start;
qi::rule<Iter, ast::module_path(), Skipper> module_path_;
// lexeme: (no skipper)
qi::rule<Iter, std::string()> value_name_, module_path_item, lowercase_ident, operator_name;
};
int main()
{
for (std::string const input : {
"Some.Module.Package.ident",
"ident",
"A.B.C_.mod", // as lowercase_ident
"A.B.C_.(mod)", // as operator_name (branch 4)
"A.B.C_.(!=)", // as operator_name (branch 1)
"(!)" // as operator_name (branch 1)
})
{
std::cout << "--------------------------------------------------------------\n";
std::cout << "Parsing '" << input << "'\n";
using It = std::string::const_iterator;
It f(input.begin()), l(input.end());
value_path<It> g;
ast::value_path data;
bool ok = qi::phrase_parse(f, l, g, ascii::space, data);
if (ok) {
std::cout << "Parse succeeded\n";
} else {
std::cout << "Parse failed\n";
}
if (f!=l)
std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
}
}
Debug Output
--------------------------------------------------------------
Parsing 'Some.Module.Package.ident'
<start>
<try>Some.Module.Package.</try>
<module_path_>
<try>Some.Module.Package.</try>
<module_path_item>
<try>Some.Module.Package.</try>
<success>.Module.Package.iden</success>
<attributes>[[S, o, m, e]]</attributes>
</module_path_item>
<module_path_item>
<try>Module.Package.ident</try>
<success>.Package.ident</success>
<attributes>[[M, o, d, u, l, e]]</attributes>
</module_path_item>
<module_path_item>
<try>Package.ident</try>
<success>.ident</success>
<attributes>[[P, a, c, k, a, g, e]]</attributes>
</module_path_item>
<module_path_item>
<try>ident</try>
<fail/>
</module_path_item>
<success>.ident</success>
<attributes>[[[S, o, m, e], [M, o, d, u, l, e], [P, a, c, k, a, g, e]]]</attributes>
</module_path_>
<value_name_>
<try>ident</try>
<lowercase_ident>
<try>ident</try>
<success></success>
<attributes>[[i, d, e, n, t]]</attributes>
</lowercase_ident>
<success></success>
<attributes>[[i, d, e, n, t]]</attributes>
</value_name_>
<success></success>
<attributes>[[[[S, o, m, e], [M, o, d, u, l, e], [P, a, c, k, a, g, e]], [i, d, e, n, t]]]</attributes>
</start>
Parse succeeded
--------------------------------------------------------------
Parsing 'ident'
<start>
<try>ident</try>
<module_path_>
<try>ident</try>
<module_path_item>
<try>ident</try>
<fail/>
</module_path_item>
<fail/>
</module_path_>
<value_name_>
<try>ident</try>
<lowercase_ident>
<try>ident</try>
<success></success>
<attributes>[[i, d, e, n, t]]</attributes>
</lowercase_ident>
<success></success>
<attributes>[[i, d, e, n, t]]</attributes>
</value_name_>
<success></success>
<attributes>[[[], [i, d, e, n, t]]]</attributes>
</start>
Parse succeeded
--------------------------------------------------------------
Parsing 'A.B.C_.mod'
<start>
<try>A.B.C_.mod</try>
<module_path_>
<try>A.B.C_.mod</try>
<module_path_item>
<try>A.B.C_.mod</try>
<success>.B.C_.mod</success>
<attributes>[[A]]</attributes>
</module_path_item>
<module_path_item>
<try>B.C_.mod</try>
<success>.C_.mod</success>
<attributes>[[B]]</attributes>
</module_path_item>
<module_path_item>
<try>C_.mod</try>
<success>.mod</success>
<attributes>[[C, _]]</attributes>
</module_path_item>
<module_path_item>
<try>mod</try>
<fail/>
</module_path_item>
<success>.mod</success>
<attributes>[[[A], [B], [C, _]]]</attributes>
</module_path_>
<value_name_>
<try>mod</try>
<lowercase_ident>
<try>mod</try>
<success></success>
<attributes>[[m, o, d]]</attributes>
</lowercase_ident>
<success></success>
<attributes>[[m, o, d]]</attributes>
</value_name_>
<success></success>
<attributes>[[[[A], [B], [C, _]], [m, o, d]]]</attributes>
</start>
Parse succeeded
--------------------------------------------------------------
Parsing 'A.B.C_.(mod)'
<start>
<try>A.B.C_.(mod)</try>
<module_path_>
<try>A.B.C_.(mod)</try>
<module_path_item>
<try>A.B.C_.(mod)</try>
<success>.B.C_.(mod)</success>
<attributes>[[A]]</attributes>
</module_path_item>
<module_path_item>
<try>B.C_.(mod)</try>
<success>.C_.(mod)</success>
<attributes>[[B]]</attributes>
</module_path_item>
<module_path_item>
<try>C_.(mod)</try>
<success>.(mod)</success>
<attributes>[[C, _]]</attributes>
</module_path_item>
<module_path_item>
<try>(mod)</try>
<fail/>
</module_path_item>
<success>.(mod)</success>
<attributes>[[[A], [B], [C, _]]]</attributes>
</module_path_>
<value_name_>
<try>(mod)</try>
<lowercase_ident>
<try>(mod)</try>
<fail/>
</lowercase_ident>
<operator_name>
<try>mod)</try>
<success>)</success>
<attributes>[[m, o, d]]</attributes>
</operator_name>
<success></success>
<attributes>[[m, o, d]]</attributes>
</value_name_>
<success></success>
<attributes>[[[[A], [B], [C, _]], [m, o, d]]]</attributes>
</start>
Parse succeeded
--------------------------------------------------------------
Parsing 'A.B.C_.(!=)'
<start>
<try>A.B.C_.(!=)</try>
<module_path_>
<try>A.B.C_.(!=)</try>
<module_path_item>
<try>A.B.C_.(!=)</try>
<success>.B.C_.(!=)</success>
<attributes>[[A]]</attributes>
</module_path_item>
<module_path_item>
<try>B.C_.(!=)</try>
<success>.C_.(!=)</success>
<attributes>[[B]]</attributes>
</module_path_item>
<module_path_item>
<try>C_.(!=)</try>
<success>.(!=)</success>
<attributes>[[C, _]]</attributes>
</module_path_item>
<module_path_item>
<try>(!=)</try>
<fail/>
</module_path_item>
<success>.(!=)</success>
<attributes>[[[A], [B], [C, _]]]</attributes>
</module_path_>
<value_name_>
<try>(!=)</try>
<lowercase_ident>
<try>(!=)</try>
<fail/>
</lowercase_ident>
<operator_name>
<try>!=)</try>
<success>)</success>
<attributes>[[!, =]]</attributes>
</operator_name>
<success></success>
<attributes>[[!, =]]</attributes>
</value_name_>
<success></success>
<attributes>[[[[A], [B], [C, _]], [!, =]]]</attributes>
</start>
Parse succeeded
--------------------------------------------------------------
Parsing '(!)'
<start>
<try>(!)</try>
<module_path_>
<try>(!)</try>
<module_path_item>
<try>(!)</try>
<fail/>
</module_path_item>
<fail/>
</module_path_>
<value_name_>
<try>(!)</try>
<lowercase_ident>
<try>(!)</try>
<fail/>
</lowercase_ident>
<operator_name>
<try>!)</try>
<success>)</success>
<attributes>[[!]]</attributes>
</operator_name>
<success></success>
<attributes>[[!]]</attributes>
</value_name_>
<success></success>
<attributes>[[[], [!]]]</attributes>
</start>
Parse succeeded