4

I'm hoping someone can shine a light through my ignorance of using the > and >> operators in spirit parsing.

I have a working grammar, where the top-level rule looks like

test = identifier >> operationRule >> repeat(1,3)[any_string] >> arrow >> any_string >> conditionRule;

It relies on attributes to automatically allocated parsed values to a fusion-adapted struct (ie a boost tuple).

However, I know that once we match the operationRule, we must continue or fail (ie we don't want to allow backtracking to try other rules that begin with identifier).

test = identifier >> 
           operationRule > repeat(1,3)[any_string] > arrow > any_string > conditionRule;

This causes a cryptic compiler error ('boost::Container' : use of class template requires template argument list). Futz around a bit and the following compiles:

test = identifier >> 
           (operationRule > repeat(1,3)[any_string] > arrow > any_string > conditionRule);

but the attribute setting no longer works - my data structure contains garbage after parsing. This can be fixed by adding actions like [at_c<0>(_val) = _1], but that seems a little clunky - as well as making things slower according to the boost docs.

So, my questions are

  1. Is it worth preventing back-tracking?
  2. Why do I need the grouping operator ()
  3. Does my last example above really stop back-tracking after operationRule is matched (I suspect not, it seems that if the entire parser inside the (...) fails backtracking will be allowed)?
  4. If the answer to the previous question is /no/, how do I construct a rule that allows backtracking if operation is /not/ matched, but does not allow backtracking once operation /is/ matched?
  5. Why does the grouping operator destroy the attribute grammar - requiring actions?

I realise this is quite a broad question - any hints that point in the right direction will be greatly appreciated!

Zero
  • 11,593
  • 9
  • 52
  • 70
  • A more complete description of my grammar is in a previous question [here](http://stackoverflow.com/questions/10289985/parse-quoted-strings-with-boostspirit). I don't think this is relevant to the question, which is why I didn't put it in. – Zero Apr 30 '12 at 05:41

1 Answers1

5
  1. Is it worth preventing back-tracking?

    Absolutely. Preventing back tracking in general is a proven way to improve parser performance.

    • reduce the use of (negative) lookahead (operator !, operator - and some operator &)
    • order branches (operator |, operator ||, operator^ and some operator */-/+) such that the most frequent/likely branch is ordered first, or that the most costly branch is tried last

    Using expectation points (>) does not essentially reduce backtracking: it will just disallow it. This will enable targeted error messages, prevent useless 'parsing-into-the-unknown'.

  2. Why do I need the grouping operator ()

    I'm not sure. I had a check using my what_is_the_attr helpers from here

    • ident >> op >> repeat(1,3)[any] >> "->" >> any
      synthesizes attribute:

      fusion::vector4<string, string, vector<string>, string>
      
    • ident >> op > repeat(1,3)[any] > "->" > any
      synthesizes attribute:

      fusion::vector3<fusion::vector2<string, string>, vector<string>, string>
      

    I haven't found the need to group subexpressions using parentheses (things compile), but obviously DataT needs to be modified to match the changed layout.

    typedef boost::tuple<
        boost::tuple<std::string, std::string>, 
        std::vector<std::string>, 
        std::string
    > DataT;
    

The full code below shows how I'd prefer to do that, using adapted structs.

  1. Does my above example really stop back-tracking after operationRule is matched (I suspect not, it seems that if the entire parser inside the (...) fails backtracking will be allowed)?

    Absolutely. If the expectation(s) is not met, a qi::expectation_failure<> exception is thrown. This by default aborts the parse. You could use qi::on_error to retry, fail, accept or rethrow. The MiniXML example has very good examples on using expectation points with qi::on_error

  2. If the answer to the previous question is /no/, how do I construct a rule that allows backtracking if operation is /not/ matched, but does not allow backtracking once operation /is/ matched?

  3. Why does the grouping operator destroy the attribute grammar - requiring actions?

    It doesn't destroy the attribute grammar, it just changes the exposed type. So, if you bind an appropriate attribute reference to the rule/grammar, you won't need semantic actions. Now, I feel there should be ways to go without the grouping , so let me try it (preferrably on your short selfcontained sample). And indeed I have found no such need. I've added a full example to help you see what is happening in my testing, and not using semantic actions.

Full Code

The full code show 5 scenarios:

  • OPTION 1: Original without expectations

    (no relevant changes)

  • OPTION 2: with expectations

    Using the modified typedef for DataT (as shown above)

  • OPTION 3: adapted struct, without expectations

    Using a userdefined struct with BOOST_FUSION_ADAPT_STRUCT

  • OPTION 4: adapted struct, with expectations

    Modifying the adapted struct from OPTION 3

  • OPTION 5: lookahead hack

    This one leverages a 'clever' (?) hack, by making all >> into expectations, and detecting the presence of a operationRule-match beforehand. This is of course suboptimal, but allows you to keep DataT unmodified, and without using semantic actions.

Obviously, define OPTION to the desired value before compiling.

#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/karma.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/fusion/adapted.hpp>
#include <iostream>

namespace qi    = boost::spirit::qi; 
namespace karma = boost::spirit::karma; 

#ifndef OPTION
#define OPTION 5
#endif

#if OPTION == 1 || OPTION == 5 // original without expectations (OR lookahead hack)
    typedef boost::tuple<std::string, std::string, std::vector<std::string>, std::string> DataT;
#elif OPTION == 2 // with expectations
    typedef boost::tuple<boost::tuple<std::string, std::string>, std::vector<std::string>, std::string> DataT;
#elif OPTION == 3 // adapted struct, without expectations
    struct DataT
    {
        std::string identifier, operation;
        std::vector<std::string> values;
        std::string destination;
    };

    BOOST_FUSION_ADAPT_STRUCT(DataT, (std::string, identifier)(std::string, operation)(std::vector<std::string>, values)(std::string, destination));
#elif OPTION == 4 // adapted struct, with expectations
    struct IdOpT
    {
        std::string identifier, operation;
    };
    struct DataT
    {
        IdOpT idop;
        std::vector<std::string> values;
        std::string destination;
    };

    BOOST_FUSION_ADAPT_STRUCT(IdOpT, (std::string, identifier)(std::string, operation));
    BOOST_FUSION_ADAPT_STRUCT(DataT, (IdOpT, idop)(std::vector<std::string>, values)(std::string, destination));
#endif

template <typename Iterator>
struct test_parser : qi::grammar<Iterator, DataT(), qi::space_type, qi::locals<char> >
{
    test_parser() : test_parser::base_type(test, "test")
    {
        using namespace qi;

        quoted_string = 
               omit    [ char_("'\"") [_a =_1] ]             
            >> no_skip [ *(char_ - char_(_a))  ]
             > lit(_a); 

        any_string = quoted_string | +qi::alnum;

        identifier = lexeme [ alnum >> *graph ];

        operationRule = string("add") | "sub";
        arrow = "->";

#if OPTION == 1 || OPTION == 3   // without expectations
        test = identifier >> operationRule >> repeat(1,3)[any_string] >> arrow >> any_string;
#elif OPTION == 2 || OPTION == 4 // with expectations
        test = identifier >> operationRule  > repeat(1,3)[any_string]  > arrow  > any_string;
#elif OPTION == 5                // lookahead hack
        test = &(identifier >> operationRule) > identifier > operationRule > repeat(1,3)[any_string] > arrow > any_string;
#endif
    }

    qi::rule<Iterator, qi::space_type/*, qi::locals<char> */> arrow;
    qi::rule<Iterator, std::string(), qi::space_type/*, qi::locals<char> */> operationRule;
    qi::rule<Iterator, std::string(), qi::space_type/*, qi::locals<char> */> identifier;
    qi::rule<Iterator, std::string(), qi::space_type, qi::locals<char> > quoted_string, any_string;
    qi::rule<Iterator, DataT(),       qi::space_type, qi::locals<char> > test;
};

int main()
{
    std::string str("addx001 add 'str1'   \"str2\"       ->  \"str3\"");
    test_parser<std::string::const_iterator> grammar;
    std::string::const_iterator iter = str.begin();
    std::string::const_iterator end  = str.end();

    DataT data;
    bool r = phrase_parse(iter, end, grammar, qi::space, data);

    if (r)
    {
        using namespace karma;
        std::cout << "OPTION " << OPTION << ": " << str << " --> ";
#if OPTION == 1 || OPTION == 3 || OPTION == 5 // without expectations (OR lookahead hack)
        std::cout << format(delimit[auto_ << auto_ << '[' << auto_ << ']' << " --> " << auto_], data) << "\n";
#elif OPTION == 2 || OPTION == 4 // with expectations
        std::cout << format(delimit[auto_ << '[' << auto_ << ']' << " --> " << auto_], data) << "\n";
#endif
    }
    if (iter!=end)
        std::cout << "Remaining: " << std::string(iter,end) << "\n";
}

Output for all OPTIONS:

for a in 1 2 3 4 5; do g++ -DOPTION=$a -I ~/custom/boost/ test.cpp -o test$a && ./test$a; done
OPTION 1: addx001 add 'str1'   "str2"       ->  "str3" --> addx001 add [ str1 str2 ]  -->  str3 
OPTION 2: addx001 add 'str1'   "str2"       ->  "str3" --> addx001 add [ str1 str2 ]  -->  str3 
OPTION 3: addx001 add 'str1'   "str2"       ->  "str3" --> addx001 add [ str1 str2 ]  -->  str3 
OPTION 4: addx001 add 'str1'   "str2"       ->  "str3" --> addx001 add [ str1 str2 ]  -->  str3 
OPTION 5: addx001 add 'str1'   "str2"       ->  "str3" --> addx001 add [ str1 str2 ]  -->  str3 
Community
  • 1
  • 1
sehe
  • 374,641
  • 47
  • 450
  • 633
  • Hello again @sehe! I will work on putting together a short example - it will probably take 24 hours. Regarding point 3, I'm not sure you get my meaning. I was wondering if `a >> (b > c > d)` isn't equivalent to `a >> z` with `z = b > c > d` and if so, does this mean that if `z` fails *or* (b > c > d)` fails backtracking is allowed? The subtle point being that if `d` fails, then `(b > c > d)` has failed, meaning we can backtrack even though `b` passed, which is not the desired effect. At any rate, a smaller example will let me experiment a bit more freely, and maybe find some answers myself! – Zero Apr 30 '12 at 22:11
  • @Zero: regarding your comment, if d fails, an exception is thrown. That should clear up the confusion, as to what will happen (assuming no special `on_error` handling) :) – sehe Apr 30 '12 at 23:34
  • @Zero: regarding the sample, I think I pretty comprehensively covered the options now, including full sample with no less than **5 different options** of how to handle the attributes/expectations. You might be interested in my **lookahead hack** (option 5), which in essence let's you have your cake and eat it, too – sehe Apr 30 '12 at 23:35