On 09/26/2015 01:45 AM, Littlefield, Tyler wrote:
Hello all:
I'm getting a segfault when I run this. It looks like the debug
prints, but when I debug it I just get an endless loop of backtrace.
If anyone can help point me in the right direction, I'd appreciate it.
I'd also appreciate, if possible any tips/tricks for cleaning up this
grammar.
First of all, it doesn't compile.
It shouldn't compile, since the grammar doesn't expose an attribute (did you mean list<Rule>()
instead of list<Rule>
?).
But you could never assign that to the output
variable (which is std::vector<std::string>
?!?)
Likewise you forget parentheses here
qi::rule<Iterator, RHS(), ascii::space_type> rhs;
qi::rule<Iterator, Rule(), ascii::space_type> rule;
qi::rule<Iterator, std::list<Rule>(), ascii::space_type> rules;
The rhs rule has infinite left-recursion:
rhs = terminal
| identifier
| ('[' >> rhs >> ']')
| ('{' >> rhs >> '}')
| ('(' >> rhs >> ')')
| (rhs >> '|' >> rhs) // OOPS
| (rhs >> ',' >> rhs) // OOPS
;
This likely explains the crashes because it leads to stackoverflow.
Notes
The livestream recording (part #1 and part #2) shows exactly the steps I took to cleanup the grammar first, and subsequently make things actually compile-worthy.
There was quite a lot of work there:
- cleanup: use implicit
qi::lit
for the interpunction ([]
, {}
, ()
and =
,|
,,
)
- use kleene+ instead of
a >> *a
(in more than one occasion)
- prefer the
%
parser to parse... lists
I had to "wiggle" the rules around "RHS" a bit; There was the infinite recursion in the last two branches (see // OOPS
). I fixed it by introducing a "pure" expression rule (which parses just a single RHS
structure. I've renamed this type to Expression
.
The "list" parsing (that recognizes lists of expressions separated by ,
or |
) is moved into the original rhs
rule, which I renamed to expr_list
to be more descriptive:
expression = qi::attr(Parser::ExprType::Terminal) >> terminal
| qi::attr(Parser::ExprType::Identifier) >> identifier
| qi::attr(Parser::ExprType::Compound) >> qi::raw [ '[' >> expr_list >> ']' ]
| qi::attr(Parser::ExprType::Compound) >> qi::raw [ '{' >> expr_list >> '}' ]
| qi::attr(Parser::ExprType::Compound) >> qi::raw [ '(' >> expr_list >> ')' ]
;
expr_list = expression % (char_("|,")) // TODO FIXME?
;
In order for the synthesized attributes to actually transform into the RHS
(now: Expression
) type, we need to actually expose a RHSType
(now: ExprType
) value for the first adapted member. You can see in the above lines that we used qi::attr()
for this purpose.
Modern compilers and boost versions can simplify the BOOST_FUSION_ADAPT_STRUCT
invocations considerably:
BOOST_FUSION_ADAPT_STRUCT(Parser::Expression, type, value)
BOOST_FUSION_ADAPT_STRUCT(Parser::Rule, identifier, expr_list)
I "upgraded" some rules to be lexemes, meaning they don't obey a skipper.
I guessed that I should probably add the space character to the acceptable character set inside a terminal
(string literal). If that's not what you want, just remove the last character from char_("[][]()<>\'\"=|.,;_ ");
.
I also changed the skipper to be blank_type
as that will not skip newline characters. You could use this to easily parse multi-line input directly with the same grammar. Do, e.g.:
rules = rule % qi::eol;
See also: Boost spirit skipper issues for additional information regarding skippers, lexemes and their interaction(s).
Without further ado, here is a working example:
Live On Coliru
#define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/qi.hpp>
#include <iostream>
#include <string>
#include <vector>
namespace Parser
{
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
enum class ExprType { Terminal, Identifier, Compound };
static inline std::ostream& operator<<(std::ostream& os, ExprType type) {
switch (type) {
case ExprType::Terminal: return os << "Terminal";
case ExprType::Identifier: return os << "Identifier";
case ExprType::Compound: return os << "Compound";
}
return os << "(unknown)";
}
struct Expression { // TODO make recursive (see `boost::make_recursive_variant`)
ExprType type;
std::string value;
};
using ExprList = std::vector<Expression>;
struct Rule {
std::string identifier; // lhs
ExprList expr_list;
};
}
//expose our structs to fusion:
BOOST_FUSION_ADAPT_STRUCT(Parser::Expression, type, value)
BOOST_FUSION_ADAPT_STRUCT(Parser::Rule, identifier, expr_list)
namespace Parser
{
typedef std::list<Rule> RuleList;
//our grammar definition
template <typename Iterator>
struct Grammar: qi::grammar<Iterator, RuleList(), ascii::blank_type>
{
Grammar(): Grammar::base_type(rules)
{
qi::char_type char_;
symbol = char_("[][]()<>\'\"=|.,;_ ");
character = qi::alpha | qi::digit | symbol;
identifier = qi::alpha >> *(qi::alnum | char_('_'));
// TODO capture strings including interpunction(?)
terminal = ('\'' >> +(character - '\'') >> '\'')
| ('\"' >> +(character - '\"') >> '\"');
expression = qi::attr(Parser::ExprType::Terminal) >> terminal
| qi::attr(Parser::ExprType::Identifier) >> identifier
| qi::attr(Parser::ExprType::Compound) >> qi::raw [ '[' >> expr_list >> ']' ]
| qi::attr(Parser::ExprType::Compound) >> qi::raw [ '{' >> expr_list >> '}' ]
| qi::attr(Parser::ExprType::Compound) >> qi::raw [ '(' >> expr_list >> ')' ]
;
expr_list = expression % (char_("|,")) // TODO FIXME?
;
// above accepts mixed separators:
// a, b, c | d, e
//
// original accepted:
//
// a, b, [ c | d ], e
// a| b| [ c , d ]| e
// a| b| [ c | d ]| e
// a, b, [ c , d ], e
rule = identifier >> '=' >> expr_list;
//rules = rule % qi::eol; // alternatively, parse multi-line input in one go
rules = +rule;
BOOST_SPIRIT_DEBUG_NODES((rules)(rule)(expr_list)(expression)(identifier)(terminal))
}
private:
qi::rule<Iterator, Expression(), ascii::blank_type> expression;
qi::rule<Iterator, ExprList(), ascii::blank_type> expr_list;
qi::rule<Iterator, Rule(), ascii::blank_type> rule;
qi::rule<Iterator, RuleList(), ascii::blank_type> rules;
// lexemes:
qi::rule<Iterator, std::string()> terminal, identifier;
qi::rule<Iterator, char()> symbol, character;
};
}
int main() {
using It = std::string::const_iterator;
Parser::Grammar<It> parser;
boost::spirit::ascii::blank_type blank;
std::string input;
while (std::getline(std::cin, input))
{
if (input.empty()) {
break;
}
It it = input.begin(), itEnd = input.end();
Parser::RuleList output;
bool result = phrase_parse(it, itEnd, parser, blank, output);
if (result) {
std::cout << "success\n";
for (auto& rule : output) {
std::cout << "\ntarget: " << rule.identifier << "\n";
for (auto& rhs : rule.expr_list) {
std::cout << "rhs: " << boost::fusion::as_vector(rhs) << "\n";
}
}
} else {
std::cout << "parse failed\n";
}
if (it != itEnd)
std::cout << "remaining unparsed: '" << std::string(it, itEnd) << "\n";
}
}
Prints the output:
success
target: assigned1
rhs: (Identifier some_var)
rhs: (Terminal a 'string' value)
rhs: (Compound [ 'okay', "this", is_another, identifier, { ( "nested" ) } ])
rhs: (Terminal done)
success
target: assigned2
rhs: (Compound { a })
And with debug enabled (#define BOOST_SPIRIT_DEBUG
):
<rules>
<try>assigned1 = some_var</try>
<rule>
<try>assigned1 = some_var</try>
<identifier>
<try>assigned1 = some_var</try>
<success> = some_var | "a 'st</success>
<attributes>[[a, s, s, i, g, n, e, d, 1]]</attributes>
</identifier>
<expr_list>
<try> some_var | "a 'stri</try>
<expression>
<try> some_var | "a 'stri</try>
<terminal>
<try>some_var | "a 'strin</try>
<fail/>
</terminal>
<identifier>
<try>some_var | "a 'strin</try>
<success> | "a 'string' value</success>
<attributes>[[s, o, m, e, _, v, a, r]]</attributes>
</identifier>
<success> | "a 'string' value</success>
<attributes>[[Identifier, [s, o, m, e, _, v, a, r]]]</attributes>
</expression>
<expression>
<try> "a 'string' value" </try>
<terminal>
<try>"a 'string' value" |</try>
<success> | [ 'okay', "this",</success>
<attributes>[[a, , ', s, t, r, i, n, g, ', , v, a, l, u, e]]</attributes>
</terminal>
<success> | [ 'okay', "this",</success>
<attributes>[[Terminal, [a, , ', s, t, r, i, n, g, ', , v, a, l, u, e]]]</attributes>
</expression>
<expression>
<try> [ 'okay', "this", i</try>
<terminal>
<try>[ 'okay', "this", is</try>
<fail/>
</terminal>
<identifier>
<try>[ 'okay', "this", is</try>
<fail/>
</identifier>
<expr_list>
<try> 'okay', "this", is_</try>
<expression>
<try> 'okay', "this", is_</try>
<terminal>
<try>'okay', "this", is_a</try>
<success>, "this", is_another</success>
<attributes>[[o, k, a, y]]</attributes>
</terminal>
<success>, "this", is_another</success>
<attributes>[[Terminal, [o, k, a, y]]]</attributes>
</expression>
<expression>
<try> "this", is_another,</try>
<terminal>
<try>"this", is_another, </try>
<success>, is_another, identi</success>
<attributes>[[t, h, i, s]]</attributes>
</terminal>
<success>, is_another, identi</success>
<attributes>[[Terminal, [t, h, i, s]]]</attributes>
</expression>
<expression>
<try> is_another, identif</try>
<terminal>
<try>is_another, identifi</try>
<fail/>
</terminal>
<identifier>
<try>is_another, identifi</try>
<success>, identifier, { ( "n</success>
<attributes>[[i, s, _, a, n, o, t, h, e, r]]</attributes>
</identifier>
<success>, identifier, { ( "n</success>
<attributes>[[Identifier, [i, s, _, a, n, o, t, h, e, r]]]</attributes>
</expression>
<expression>
<try> identifier, { ( "ne</try>
<terminal>
<try>identifier, { ( "nes</try>
<fail/>
</terminal>
<identifier>
<try>identifier, { ( "nes</try>
<success>, { ( "nested" ) } ]</success>
<attributes>[[i, d, e, n, t, i, f, i, e, r]]</attributes>
</identifier>
<success>, { ( "nested" ) } ]</success>
<attributes>[[Identifier, [i, d, e, n, t, i, f, i, e, r]]]</attributes>
</expression>
<expression>
<try> { ( "nested" ) } ] </try>
<terminal>
<try>{ ( "nested" ) } ] |</try>
<fail/>
</terminal>
<identifier>
<try>{ ( "nested" ) } ] |</try>
<fail/>
</identifier>
<expr_list>
<try> ( "nested" ) } ] | </try>
<expression>
<try> ( "nested" ) } ] | </try>
<terminal>
<try>( "nested" ) } ] | "</try>
<fail/>
</terminal>
<identifier>
<try>( "nested" ) } ] | "</try>
<fail/>
</identifier>
<expr_list>
<try> "nested" ) } ] | "d</try>
<expression>
<try> "nested" ) } ] | "d</try>
<terminal>
<try>"nested" ) } ] | "do</try>
<success> ) } ] | "done"</success>
<attributes>[[n, e, s, t, e, d]]</attributes>
</terminal>
<success> ) } ] | "done"</success>
<attributes>[[Terminal, [n, e, s, t, e, d]]]</attributes>
</expression>
<success> ) } ] | "done"</success>
<attributes>[[[Terminal, [n, e, s, t, e, d]]]]</attributes>
</expr_list>
<success> } ] | "done"</success>
<attributes>[[Compound, [(, , ", n, e, s, t, e, d, ", , )]]]</attributes>
</expression>
<success> } ] | "done"</success>
<attributes>[[[Compound, [(, , ", n, e, s, t, e, d, ", , )]]]]</attributes>
</expr_list>
<success> ] | "done"</success>
<attributes>[[Compound, [{, , (, , ", n, e, s, t, e, d, ", , ), , }]]]</attributes>
</expression>
<success> ] | "done"</success>
<attributes>[[[Terminal, [o, k, a, y]], [Terminal, [t, h, i, s]], [Identifier, [i, s, _, a, n, o, t, h, e, r]], [Identifier, [i, d, e, n, t, i, f, i, e, r]], [Compound, [{, , (, , ", n, e, s, t, e, d, ", , ), , }]]]]</attributes>
</expr_list>
<success> | "done"</success>
<attributes>[[Compound, [[, , ', o, k, a, y, ', ,, , ", t, h, i, s, ", ,, , i, s, _, a, n, o, t, h, e, r, ,, , i, d, e, n, t, i, f, i, e, r, ,, , {, , (, , ", n, e, s, t, e, d, ", , ), , }, , ]]]]</attributes>
</expression>
<expression>
<try> "done"</try>
<terminal>
<try>"done"</try>
<success></success>
<attributes>[[d, o, n, e]]</attributes>
</terminal>
<success></success>
<attributes>[[Terminal, [d, o, n, e]]]</attributes>
</expression>
<success></success>
<attributes>[[[Identifier, [s, o, m, e, _, v, a, r]], [Terminal, [a, , ', s, t, r, i, n, g, ', , v, a, l, u, e]], [Compound, [[, , ', o, k, a, y, ', ,, , ", t, h, i, s, ", ,, , i, s, _, a, n, o, t, h, e, r, ,, , i, d, e, n, t, i, f, i, e, r, ,, , {, , (, , ", n, e, s, t, e, d, ", , ), , }, , ]]], [Terminal, [d, o, n, e]]]]</attributes>
</expr_list>
<success></success>
<attributes>[[[a, s, s, i, g, n, e, d, 1], [[Identifier, [s, o, m, e, _, v, a, r]], [Terminal, [a, , ', s, t, r, i, n, g, ', , v, a, l, u, e]], [Compound, [[, , ', o, k, a, y, ', ,, , ", t, h, i, s, ", ,, , i, s, _, a, n, o, t, h, e, r, ,, , i, d, e, n, t, i, f, i, e, r, ,, , {, , (, , ", n, e, s, t, e, d, ", , ), , }, , ]]], [Terminal, [d, o, n, e]]]]]</attributes>
</rule>
<rule>
<try></try>
<identifier>
<try></try>
<fail/>
</identifier>
<fail/>
</rule>
<success></success>
<attributes>[[[[a, s, s, i, g, n, e, d, 1], [[Identifier, [s, o, m, e, _, v, a, r]], [Terminal, [a, , ', s, t, r, i, n, g, ', , v, a, l, u, e]], [Compound, [[, , ', o, k, a, y, ', ,, , ", t, h, i, s, ", ,, , i, s, _, a, n, o, t, h, e, r, ,, , i, d, e, n, t, i, f, i, e, r, ,, , {, , (, , ", n, e, s, t, e, d, ", , ), , }, , ]]], [Terminal, [d, o, n, e]]]]]]</attributes>
</rules>
success
target: assigned1
rhs: (Identifier some_var)
rhs: (Terminal a 'string' value)
rhs: (Compound [ 'okay', "this", is_another, identifier, { ( "nested" ) } ])
rhs: (Terminal done)
<rules>
<try>assigned2 = { a }</try>
<rule>
<try>assigned2 = { a }</try>
<identifier>
<try>assigned2 = { a }</try>
<success> = { a }</success>
<attributes>[[a, s, s, i, g, n, e, d, 2]]</attributes>
</identifier>
<expr_list>
<try> { a }</try>
<expression>
<try> { a }</try>
<terminal>
<try>{ a }</try>
<fail/>
</terminal>
<identifier>
<try>{ a }</try>
<fail/>
</identifier>
<expr_list>
<try> a }</try>
<expression>
<try> a }</try>
<terminal>
<try>a }</try>
<fail/>
</terminal>
<identifier>
<try>a }</try>
<success> }</success>
<attributes>[[a]]</attributes>
</identifier>
<success> }</success>
<attributes>[[Identifier, [a]]]</attributes>
</expression>
<success> }</success>
<attributes>[[[Identifier, [a]]]]</attributes>
</expr_list>
<success></success>
<attributes>[[Compound, [{, , a, , }]]]</attributes>
</expression>
<success></success>
<attributes>[[[Compound, [{, , a, , }]]]]</attributes>
</expr_list>
<success></success>
<attributes>[[[a, s, s, i, g, n, e, d, 2], [[Compound, [{, , a, , }]]]]]</attributes>
</rule>
<rule>
<try></try>
<identifier>
<try></try>
<fail/>
</identifier>
<fail/>
</rule>
<success></success>
<attributes>[[[[a, s, s, i, g, n, e, d, 2], [[Compound, [{, , a, , }]]]]]]</attributes>
</rules>
success
target: assigned2
rhs: (Compound { a })
Lose Ends
There are some TODO's left in the code. Implementing them would require some more effort, but I'm not sure it's actually the right direction to go in, so I'll await feedback :)
One of the elementary TODO's left would be to represent the recursive nature of expressions in the AST proper. Right now, I "copped out" by just inserting the source string for a nested Compound expression.