1

I'm getting a segfault when I run this. It looks like the debug prints, but when I debug it I just get an endless loop of backtrace. If anyone can help point me in the right direction, I'd appreciate it. I'd also appreciate, if possible any tips/tricks for cleaning up this grammar.

Thanks!

//code here:
/***
*I-EBNF parser
*
*This defines a grammar for BNF.
*/

//Speeds up compilation times.
//This is a relatively small grammar, this is useful.
#define BOOST_SPIRIT_NO_PREDEFINED_TERMINALS
#define BOOST_SPIRIT_QI_DEBUG

#include <boost/config/warning_disable.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <boost/fusion/adapted.hpp>
#include <boost/fusion/support.hpp>
#include <vector>
#include <string>
#include <iostream>

namespace Parser
{

namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;

enum class RHSType
{
    Terminal, Identifier
};
struct RHS
{
    RHSType type;
    std::string value;
};
struct Rule
{
    std::string identifier; //lhs
    std::vector<RHS> rhs;
};
}

//expose our structs to fusion:
BOOST_FUSION_ADAPT_STRUCT(
    Parser::RHS,
    (Parser::RHSType, type)
    (std::string, value)
)
BOOST_FUSION_ADAPT_STRUCT(
    Parser::Rule,
    (std::string, identifier)
    (std::vector<Parser::RHS>, rhs)
)

namespace Parser
{
typedef std::vector<Rule> RuleList;

//our grammar definition
template <typename Iterator>
struct Grammar: qi::grammar<Iterator, std::list<Rule>, ascii::space_type>
{
    Grammar(): Grammar::base_type(rules)
    {
        qi::char_type char_;

        letter = char_("a-zA-Z");
        digit = char_('0', '9');
        symbol = char_('[') | ']' | '[' | ']' | '(' | ')' | '<' | '>'
| '\'' | '\"' | '=' | '|' | '.' | ',' | ';';
        character = letter | digit | symbol | '_';
        identifier = letter >> *(letter | digit | '_');
        terminal = (char_('\'') >> character >> *character >>
char_('\'')) | (char_('\"') >> character >> *character >> char_('\"'));
        lhs = identifier;
        rhs = terminal | identifier | char_('[') >> rhs >> char_(']')
| char_('{') >> rhs >> char_('}') | char_('(') >> rhs >> char_(')') |
rhs >> char_('|') >> rhs | rhs >> char_(',') >> rhs;
        rule = identifier >> char_('=') >> rhs;
        rules = rule >> *rule;
    }

private:
    qi::rule<Iterator, char(), ascii::space_type> letter, digit,
symbol, character;
    qi::rule<Iterator, std::string(), ascii::space_type> identifier,
lhs, terminal;
    qi::rule<Iterator, RHS, ascii::space_type> rhs;
    qi::rule<Iterator, Rule, ascii::space_type> rule;
    qi::rule<Iterator, std::list<Rule>, ascii::space_type> rules;
};

}

int main()
{
    Parser::Grammar<std::string::const_iterator> parser;
    boost::spirit::ascii::space_type space;
    std::string input;
    std::vector<std::string> output;
    bool result;

    while (std::getline(std::cin, input))
        {
            if (input.empty())
                {
                    break;
                }
            std::string::const_iterator it, itEnd;
            it = input.begin();
            itEnd = input.end();
            result = phrase_parse(it, itEnd, parser, space, output);
            if (result && it == itEnd)
                {
                    std::cout << "success" << std::endl;
                }
        }

    return 0;
}

¹ cross posted from the [spirit-general] mailing list: http://boost.2283326.n4.nabble.com/parser-segfault-tips-tricks-td4680336.html

sehe
  • 374,641
  • 47
  • 450
  • 633
  • 1
    I'm livecoding the answer here: https://www.livecoding.tv/sehe/ ([experiment](http://chat.stackoverflow.com/transcript/10?m=24182469#24182469)) – sehe Sep 26 '15 at 19:29
  • @Nevermore yeah, that's a good point. I came up with the following sample just now (livestream): `assigned1 = some_var | "a 'string' value" | [ 'okay', "this", is_another, identifier, { ( "nested" ) } ] | "done"`. (Note: I posted the question here on behalf of an asker on the mailing list) – sehe Sep 26 '15 at 20:30
  • OK. Btw, couldn't reproduce on gcc 4.9.2 + boost 1.58. – Nevermore Sep 26 '15 at 20:41
  • @Nevermore you mean you couldn't compile it? Or it didn't crash? – sehe Sep 26 '15 at 20:58
  • @Nevermore Well. It depends on the input you use, because you need to trigger the left-recursion (see my answer). Also, there was quite certainly some undefined behaviour, because of the way the attributes were (incorrectly) declared. This might explain why the crash isn't consistent – sehe Sep 26 '15 at 21:08
  • The recorded livestreams of me treating this questions are available here: [part #1](http://tinyurl.com/oqr4hsb) and [part #2](http://tinyurl.com/p46hqxr) – sehe Sep 26 '15 at 21:12

1 Answers1

2

On 09/26/2015 01:45 AM, Littlefield, Tyler wrote:

Hello all: I'm getting a segfault when I run this. It looks like the debug prints, but when I debug it I just get an endless loop of backtrace. If anyone can help point me in the right direction, I'd appreciate it. I'd also appreciate, if possible any tips/tricks for cleaning up this grammar.

First of all, it doesn't compile.

It shouldn't compile, since the grammar doesn't expose an attribute (did you mean list<Rule>() instead of list<Rule>?).

But you could never assign that to the output variable (which is std::vector<std::string>?!?)

Likewise you forget parentheses here

qi::rule<Iterator, RHS(), ascii::space_type> rhs;
qi::rule<Iterator, Rule(), ascii::space_type> rule;
qi::rule<Iterator, std::list<Rule>(), ascii::space_type> rules;

The rhs rule has infinite left-recursion:

    rhs               = terminal
                      | identifier
                      | ('[' >> rhs >> ']')
                      | ('{' >> rhs >> '}')
                      | ('(' >> rhs >> ')')
                      | (rhs >> '|' >> rhs)  // OOPS
                      | (rhs >> ',' >> rhs)  // OOPS
                      ;

This likely explains the crashes because it leads to stackoverflow.

Notes

The livestream recording (part #1 and part #2) shows exactly the steps I took to cleanup the grammar first, and subsequently make things actually compile-worthy.

There was quite a lot of work there:

  • cleanup: use implicit qi::lit for the interpunction ([], {}, () and =,|,,)
  • use kleene+ instead of a >> *a (in more than one occasion)
  • prefer the % parser to parse... lists
  • I had to "wiggle" the rules around "RHS" a bit; There was the infinite recursion in the last two branches (see // OOPS). I fixed it by introducing a "pure" expression rule (which parses just a single RHS structure. I've renamed this type to Expression.

    The "list" parsing (that recognizes lists of expressions separated by , or |) is moved into the original rhs rule, which I renamed to expr_list to be more descriptive:

    expression        = qi::attr(Parser::ExprType::Terminal)   >> terminal 
                      | qi::attr(Parser::ExprType::Identifier) >> identifier 
                      | qi::attr(Parser::ExprType::Compound)   >> qi::raw [ '[' >> expr_list >> ']' ]
                      | qi::attr(Parser::ExprType::Compound)   >> qi::raw [ '{' >> expr_list >> '}' ] 
                      | qi::attr(Parser::ExprType::Compound)   >> qi::raw [ '(' >> expr_list >> ')' ] 
                      ;
    
    expr_list         = expression % (char_("|,")) // TODO FIXME?
                      ;
    
  • In order for the synthesized attributes to actually transform into the RHS (now: Expression) type, we need to actually expose a RHSType (now: ExprType) value for the first adapted member. You can see in the above lines that we used qi::attr() for this purpose.

  • Modern compilers and boost versions can simplify the BOOST_FUSION_ADAPT_STRUCT invocations considerably:

    BOOST_FUSION_ADAPT_STRUCT(Parser::Expression, type, value)
    BOOST_FUSION_ADAPT_STRUCT(Parser::Rule, identifier, expr_list)
    
  • I "upgraded" some rules to be lexemes, meaning they don't obey a skipper.

    I guessed that I should probably add the space character to the acceptable character set inside a terminal (string literal). If that's not what you want, just remove the last character from char_("[][]()<>\'\"=|.,;_ ");.

  • I also changed the skipper to be blank_type as that will not skip newline characters. You could use this to easily parse multi-line input directly with the same grammar. Do, e.g.:

    rules = rule % qi::eol;
    

See also: Boost spirit skipper issues for additional information regarding skippers, lexemes and their interaction(s).

Without further ado, here is a working example:

Live On Coliru

#define BOOST_SPIRIT_DEBUG

#include <boost/fusion/adapted.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/qi.hpp>
#include <iostream>
#include <string>
#include <vector>

namespace Parser
{
    namespace qi    = boost::spirit::qi;
    namespace ascii = boost::spirit::ascii;

    enum class ExprType { Terminal, Identifier, Compound };

    static inline std::ostream& operator<<(std::ostream& os, ExprType type) {
        switch (type) {
            case ExprType::Terminal:   return os << "Terminal";
            case ExprType::Identifier: return os << "Identifier";
            case ExprType::Compound:   return os << "Compound";
        }
        return os << "(unknown)";
    }

    struct Expression { // TODO make recursive (see `boost::make_recursive_variant`)
        ExprType    type;
        std::string value;
    };

    using ExprList = std::vector<Expression>;

    struct Rule {
        std::string identifier; // lhs
        ExprList    expr_list;
    };
}

//expose our structs to fusion:
BOOST_FUSION_ADAPT_STRUCT(Parser::Expression, type, value)
BOOST_FUSION_ADAPT_STRUCT(Parser::Rule, identifier, expr_list)

namespace Parser
{
    typedef std::list<Rule> RuleList;

    //our grammar definition
    template <typename Iterator>
    struct Grammar: qi::grammar<Iterator, RuleList(), ascii::blank_type>
    {
        Grammar(): Grammar::base_type(rules)
        {
            qi::char_type char_;

            symbol            = char_("[][]()<>\'\"=|.,;_ ");
            character         = qi::alpha | qi::digit | symbol;
            identifier        = qi::alpha >> *(qi::alnum | char_('_'));


            // TODO capture strings including interpunction(?)
            terminal          = ('\'' >> +(character - '\'') >> '\'') 
                              | ('\"' >> +(character - '\"') >> '\"');

            expression        = qi::attr(Parser::ExprType::Terminal)   >> terminal 
                              | qi::attr(Parser::ExprType::Identifier) >> identifier 
                              | qi::attr(Parser::ExprType::Compound)   >> qi::raw [ '[' >> expr_list >> ']' ]
                              | qi::attr(Parser::ExprType::Compound)   >> qi::raw [ '{' >> expr_list >> '}' ] 
                              | qi::attr(Parser::ExprType::Compound)   >> qi::raw [ '(' >> expr_list >> ')' ] 
                              ;

            expr_list         = expression % (char_("|,")) // TODO FIXME?
                              ;
                                    // above accepts mixed separators:
                                    //     a, b, c | d, e
                                    //
                                    // original accepted:
                                    //
                                    //     a, b, [ c | d ], e
                                    //     a| b| [ c , d ]| e
                                    //     a| b| [ c | d ]| e
                                    //     a, b, [ c , d ], e

            rule              = identifier >> '=' >> expr_list;
            //rules           = rule % qi::eol; // alternatively, parse multi-line input in one go
            rules             = +rule;

            BOOST_SPIRIT_DEBUG_NODES((rules)(rule)(expr_list)(expression)(identifier)(terminal))
        }

    private:
        qi::rule<Iterator, Expression(),  ascii::blank_type> expression;
        qi::rule<Iterator, ExprList(),    ascii::blank_type> expr_list;
        qi::rule<Iterator, Rule(),        ascii::blank_type> rule;
        qi::rule<Iterator, RuleList(),    ascii::blank_type> rules;
        // lexemes:
        qi::rule<Iterator, std::string()> terminal, identifier;
        qi::rule<Iterator, char()>        symbol, character;
    };

}

int main() {
    using It = std::string::const_iterator;

    Parser::Grammar<It> parser;
    boost::spirit::ascii::blank_type blank;

    std::string input;

    while (std::getline(std::cin, input))
    {
        if (input.empty()) {
            break;
        }

        It it = input.begin(), itEnd = input.end();

        Parser::RuleList output;
        bool result = phrase_parse(it, itEnd, parser, blank, output);
        if (result) {
            std::cout << "success\n";
            for (auto& rule : output) {
                std::cout << "\ntarget: " << rule.identifier << "\n";
                for (auto& rhs : rule.expr_list) {
                    std::cout << "rhs:    " << boost::fusion::as_vector(rhs) << "\n";
                }
            }
        } else {
            std::cout << "parse failed\n";
        }

        if (it != itEnd)
            std::cout << "remaining unparsed: '" << std::string(it, itEnd) << "\n";
    }
}

Prints the output:

success

target: assigned1
rhs:    (Identifier some_var)
rhs:    (Terminal a 'string' value)
rhs:    (Compound [ 'okay', "this", is_another, identifier, { ( "nested" ) } ])
rhs:    (Terminal done)
success

target: assigned2
rhs:    (Compound { a })

And with debug enabled (#define BOOST_SPIRIT_DEBUG):

<rules>
<try>assigned1 = some_var</try>
<rule>
    <try>assigned1 = some_var</try>
    <identifier>
    <try>assigned1 = some_var</try>
    <success> = some_var | "a 'st</success>
    <attributes>[[a, s, s, i, g, n, e, d, 1]]</attributes>
    </identifier>
    <expr_list>
    <try> some_var | "a 'stri</try>
    <expression>
        <try> some_var | "a 'stri</try>
        <terminal>
        <try>some_var | "a 'strin</try>
        <fail/>
        </terminal>
        <identifier>
        <try>some_var | "a 'strin</try>
        <success> | "a 'string' value</success>
        <attributes>[[s, o, m, e, _, v, a, r]]</attributes>
        </identifier>
        <success> | "a 'string' value</success>
        <attributes>[[Identifier, [s, o, m, e, _, v, a, r]]]</attributes>
    </expression>
    <expression>
        <try> "a 'string' value" </try>
        <terminal>
        <try>"a 'string' value" |</try>
        <success> | [ 'okay', "this",</success>
        <attributes>[[a,  , ', s, t, r, i, n, g, ',  , v, a, l, u, e]]</attributes>
        </terminal>
        <success> | [ 'okay', "this",</success>
        <attributes>[[Terminal, [a,  , ', s, t, r, i, n, g, ',  , v, a, l, u, e]]]</attributes>
    </expression>
    <expression>
        <try> [ 'okay', "this", i</try>
        <terminal>
        <try>[ 'okay', "this", is</try>
        <fail/>
        </terminal>
        <identifier>
        <try>[ 'okay', "this", is</try>
        <fail/>
        </identifier>
        <expr_list>
        <try> 'okay', "this", is_</try>
        <expression>
            <try> 'okay', "this", is_</try>
            <terminal>
            <try>'okay', "this", is_a</try>
            <success>, "this", is_another</success>
            <attributes>[[o, k, a, y]]</attributes>
            </terminal>
            <success>, "this", is_another</success>
            <attributes>[[Terminal, [o, k, a, y]]]</attributes>
        </expression>
        <expression>
            <try> "this", is_another,</try>
            <terminal>
            <try>"this", is_another, </try>
            <success>, is_another, identi</success>
            <attributes>[[t, h, i, s]]</attributes>
            </terminal>
            <success>, is_another, identi</success>
            <attributes>[[Terminal, [t, h, i, s]]]</attributes>
        </expression>
        <expression>
            <try> is_another, identif</try>
            <terminal>
            <try>is_another, identifi</try>
            <fail/>
            </terminal>
            <identifier>
            <try>is_another, identifi</try>
            <success>, identifier, { ( "n</success>
            <attributes>[[i, s, _, a, n, o, t, h, e, r]]</attributes>
            </identifier>
            <success>, identifier, { ( "n</success>
            <attributes>[[Identifier, [i, s, _, a, n, o, t, h, e, r]]]</attributes>
        </expression>
        <expression>
            <try> identifier, { ( "ne</try>
            <terminal>
            <try>identifier, { ( "nes</try>
            <fail/>
            </terminal>
            <identifier>
            <try>identifier, { ( "nes</try>
            <success>, { ( "nested" ) } ]</success>
            <attributes>[[i, d, e, n, t, i, f, i, e, r]]</attributes>
            </identifier>
            <success>, { ( "nested" ) } ]</success>
            <attributes>[[Identifier, [i, d, e, n, t, i, f, i, e, r]]]</attributes>
        </expression>
        <expression>
            <try> { ( "nested" ) } ] </try>
            <terminal>
            <try>{ ( "nested" ) } ] |</try>
            <fail/>
            </terminal>
            <identifier>
            <try>{ ( "nested" ) } ] |</try>
            <fail/>
            </identifier>
            <expr_list>
            <try> ( "nested" ) } ] | </try>
            <expression>
                <try> ( "nested" ) } ] | </try>
                <terminal>
                <try>( "nested" ) } ] | "</try>
                <fail/>
                </terminal>
                <identifier>
                <try>( "nested" ) } ] | "</try>
                <fail/>
                </identifier>
                <expr_list>
                <try> "nested" ) } ] | "d</try>
                <expression>
                    <try> "nested" ) } ] | "d</try>
                    <terminal>
                    <try>"nested" ) } ] | "do</try>
                    <success> ) } ] | "done"</success>
                    <attributes>[[n, e, s, t, e, d]]</attributes>
                    </terminal>
                    <success> ) } ] | "done"</success>
                    <attributes>[[Terminal, [n, e, s, t, e, d]]]</attributes>
                </expression>
                <success> ) } ] | "done"</success>
                <attributes>[[[Terminal, [n, e, s, t, e, d]]]]</attributes>
                </expr_list>
                <success> } ] | "done"</success>
                <attributes>[[Compound, [(,  , ", n, e, s, t, e, d, ",  , )]]]</attributes>
            </expression>
            <success> } ] | "done"</success>
            <attributes>[[[Compound, [(,  , ", n, e, s, t, e, d, ",  , )]]]]</attributes>
            </expr_list>
            <success> ] | "done"</success>
            <attributes>[[Compound, [{,  , (,  , ", n, e, s, t, e, d, ",  , ),  , }]]]</attributes>
        </expression>
        <success> ] | "done"</success>
        <attributes>[[[Terminal, [o, k, a, y]], [Terminal, [t, h, i, s]], [Identifier, [i, s, _, a, n, o, t, h, e, r]], [Identifier, [i, d, e, n, t, i, f, i, e, r]], [Compound, [{,  , (,  , ", n, e, s, t, e, d, ",  , ),  , }]]]]</attributes>
        </expr_list>
        <success> | "done"</success>
        <attributes>[[Compound, [[,  , ', o, k, a, y, ', ,,  , ", t, h, i, s, ", ,,  , i, s, _, a, n, o, t, h, e, r, ,,  , i, d, e, n, t, i, f, i, e, r, ,,  , {,  , (,  , ", n, e, s, t, e, d, ",  , ),  , },  , ]]]]</attributes>
    </expression>
    <expression>
        <try> "done"</try>
        <terminal>
        <try>"done"</try>
        <success></success>
        <attributes>[[d, o, n, e]]</attributes>
        </terminal>
        <success></success>
        <attributes>[[Terminal, [d, o, n, e]]]</attributes>
    </expression>
    <success></success>
    <attributes>[[[Identifier, [s, o, m, e, _, v, a, r]], [Terminal, [a,  , ', s, t, r, i, n, g, ',  , v, a, l, u, e]], [Compound, [[,  , ', o, k, a, y, ', ,,  , ", t, h, i, s, ", ,,  , i, s, _, a, n, o, t, h, e, r, ,,  , i, d, e, n, t, i, f, i, e, r, ,,  , {,  , (,  , ", n, e, s, t, e, d, ",  , ),  , },  , ]]], [Terminal, [d, o, n, e]]]]</attributes>
    </expr_list>
    <success></success>
    <attributes>[[[a, s, s, i, g, n, e, d, 1], [[Identifier, [s, o, m, e, _, v, a, r]], [Terminal, [a,  , ', s, t, r, i, n, g, ',  , v, a, l, u, e]], [Compound, [[,  , ', o, k, a, y, ', ,,  , ", t, h, i, s, ", ,,  , i, s, _, a, n, o, t, h, e, r, ,,  , i, d, e, n, t, i, f, i, e, r, ,,  , {,  , (,  , ", n, e, s, t, e, d, ",  , ),  , },  , ]]], [Terminal, [d, o, n, e]]]]]</attributes>
</rule>
<rule>
    <try></try>
    <identifier>
    <try></try>
    <fail/>
    </identifier>
    <fail/>
</rule>
<success></success>
<attributes>[[[[a, s, s, i, g, n, e, d, 1], [[Identifier, [s, o, m, e, _, v, a, r]], [Terminal, [a,  , ', s, t, r, i, n, g, ',  , v, a, l, u, e]], [Compound, [[,  , ', o, k, a, y, ', ,,  , ", t, h, i, s, ", ,,  , i, s, _, a, n, o, t, h, e, r, ,,  , i, d, e, n, t, i, f, i, e, r, ,,  , {,  , (,  , ", n, e, s, t, e, d, ",  , ),  , },  , ]]], [Terminal, [d, o, n, e]]]]]]</attributes>
</rules>
success

target: assigned1
rhs:    (Identifier some_var)
rhs:    (Terminal a 'string' value)
rhs:    (Compound [ 'okay', "this", is_another, identifier, { ( "nested" ) } ])
rhs:    (Terminal done)
<rules>
<try>assigned2 = { a }</try>
<rule>
    <try>assigned2 = { a }</try>
    <identifier>
    <try>assigned2 = { a }</try>
    <success> = { a }</success>
    <attributes>[[a, s, s, i, g, n, e, d, 2]]</attributes>
    </identifier>
    <expr_list>
    <try> { a }</try>
    <expression>
        <try> { a }</try>
        <terminal>
        <try>{ a }</try>
        <fail/>
        </terminal>
        <identifier>
        <try>{ a }</try>
        <fail/>
        </identifier>
        <expr_list>
        <try> a }</try>
        <expression>
            <try> a }</try>
            <terminal>
            <try>a }</try>
            <fail/>
            </terminal>
            <identifier>
            <try>a }</try>
            <success> }</success>
            <attributes>[[a]]</attributes>
            </identifier>
            <success> }</success>
            <attributes>[[Identifier, [a]]]</attributes>
        </expression>
        <success> }</success>
        <attributes>[[[Identifier, [a]]]]</attributes>
        </expr_list>
        <success></success>
        <attributes>[[Compound, [{,  , a,  , }]]]</attributes>
    </expression>
    <success></success>
    <attributes>[[[Compound, [{,  , a,  , }]]]]</attributes>
    </expr_list>
    <success></success>
    <attributes>[[[a, s, s, i, g, n, e, d, 2], [[Compound, [{,  , a,  , }]]]]]</attributes>
</rule>
<rule>
    <try></try>
    <identifier>
    <try></try>
    <fail/>
    </identifier>
    <fail/>
</rule>
<success></success>
<attributes>[[[[a, s, s, i, g, n, e, d, 2], [[Compound, [{,  , a,  , }]]]]]]</attributes>
</rules>
success

target: assigned2
rhs:    (Compound { a })

Lose Ends

There are some TODO's left in the code. Implementing them would require some more effort, but I'm not sure it's actually the right direction to go in, so I'll await feedback :)

One of the elementary TODO's left would be to represent the recursive nature of expressions in the AST proper. Right now, I "copped out" by just inserting the source string for a nested Compound expression.

Community
  • 1
  • 1
sehe
  • 374,641
  • 47
  • 450
  • 633
  • "did you mean `list()` instead of `list`?" They are supposed to be functionally equivalent now, you might be seeing an implementation bug. – K-ballo Sep 26 '15 at 21:18
  • Possibly. Likely, even, because, if you look at the rest of the code there was an absolute mismatch with the attributes exposed and synthesized in both `rhs` and the grammar itself... – sehe Sep 26 '15 at 21:19
  • I did not look at the rest of the code. That line just caught my eye, because after years of people being bit by it, that design issue of `T` vs `T()` is finally supposed to be solved. – K-ballo Sep 26 '15 at 21:22
  • @K-ballo I know this. I just didn't make assumptions about the boost version used and extrapolated from the fact the OP was even able to compile his code :) That shouldn't be possible in bleeding edge boost then – sehe Sep 26 '15 at 21:23