5

I would like to parse a sentence where some strings may be unquoted, 'quoted' or "quoted". The code below almost works - but it fails to match closing quotes. I'm guessing this is because of the qq reference. A modification is commented in the code, the modification reults in "quoted' or 'quoted" also parsing and helps show the original problem is with the closing quote. The code also describes the exact grammar.

To be completely clear: unquoted strings parse. A quoted string like 'hello' will parse the open quote ', all the characters hello, but then fail to parse the final quote '.

I made another attempt, similar the begin/end tag matching in the boost tutorials, but without success.

template <typename Iterator>
struct test_parser : qi::grammar<Iterator, dectest::Test(), ascii::space_type>
{
    test_parser()
        :
    test_parser::base_type(test, "test")
    {
        using qi::fail;
        using qi::on_error;
        using qi::lit;
        using qi::lexeme;
        using ascii::char_;
        using qi::repeat;
        using namespace qi::labels;
        using boost::phoenix::construct;
        using boost::phoenix::at_c;
        using boost::phoenix::push_back;
        using boost::phoenix::val;
        using boost::phoenix::ref;
        using qi::space;

        char qq;          

        arrow = lit("->");

        open_quote = (char_('\'') | char_('"')) [ref(qq) = _1];  // Remember what the opening quote was
        close_quote = lit(val(qq));  // Close must match the open
        // close_quote = (char_('\'') | char_('"')); // Enable this line to get code 'almost' working

        quoted_string = 
            open_quote
            >> +ascii::alnum        
            >> close_quote; 

        unquoted_string %= +ascii::alnum;
        any_string %= (quoted_string | unquoted_string);

        test = 
            unquoted_string             [at_c<0>(_val) = _1] 
            > unquoted_string           [at_c<1>(_val) = _1]   
            > repeat(1,3)[any_string]   [at_c<2>(_val) = _1]
            > arrow
            > any_string                [at_c<3>(_val) = _1] 
            ;

        // .. <snip>set rule names
        on_error<fail>(/* <snip> */);
        // debug rules
    }

    qi::rule<Iterator> arrow;
    qi::rule<Iterator> open_quote;
    qi::rule<Iterator> close_quote;

    qi::rule<Iterator, std::string()> quoted_string;
    qi::rule<Iterator, std::string()> unquoted_string;
    qi::rule<Iterator, std::string()> any_string;     // A quoted or unquoted string

    qi::rule<Iterator, dectest::Test(), ascii::space_type> test;

};


// main()
// This example should fail at the very end 
// (ie not parse "str3' because of the mismatched quote
// However, it fails to parse the closing quote of str1
typedef boost::tuple<string, string, vector<string>, string> DataT;
DataT data;
std::string str("addx001 add 'str1'   \"str2\"       ->  \"str3'");
std::string::const_iterator iter = str.begin();
const std::string::const_iterator end = str.end();
bool r = phrase_parse(iter, end, grammar, boost::spirit::ascii::space, data);

For bonus credit: A solution that avoid a local data member (such as char qq in above example) would be preferred, but from a practical point of view I'll use anything that works!

Zero
  • 11,593
  • 9
  • 52
  • 70
  • For the record, making `char qq` a member variable of `struct test_parser` fails in exactly the same way. – Zero Apr 24 '12 at 00:11
  • Fails in what "same way?" You haven't told us how this one fails (though I can image it is due to the `qq` reference). – Nicol Bolas Apr 24 '12 at 00:16
  • @NicolBolas It was a comment in the code - I've since clarified the question, thank for pointing out. I also suspect the ref(qq), but the downside of boost lambda&co is they are tricky to debug as you can't step through in the traditional sense! – Zero Apr 24 '12 at 00:30

1 Answers1

12

The reference to qq becomes dangling after leaving the constructor, so that is indeed a problem.

qi::locals is the canonical way to keep local state inside parser expressions. Your other option would be to extend the lifetime of qq (by making it a member of the grammar class, e.g.). Lastly, you might be interested in inherited attributes as well. This mechanism gives you a way to call a rule/grammar with 'parameters' (passing local state around).

NOTE There are caveats with the use of the kleene operator +: it is greedy, and parsing fails if the string is not terminated with the expected quote.

See another answer I wrote for more complete examples of treating arbitrary contents in (optionally/partially) quoted strings, that allow escaping of quotes inside quoted strings and more things like that:

I've reduced the grammar to the relevant bit, and included a few test cases:

#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/fusion/adapted.hpp>

namespace qi = boost::spirit::qi;

template <typename Iterator>
struct test_parser : qi::grammar<Iterator, std::string(), qi::space_type, qi::locals<char> >
{
    test_parser() : test_parser::base_type(any_string, "test")
    {
        using namespace qi;

        quoted_string = 
               omit    [ char_("'\"") [_a =_1] ]             
            >> no_skip [ *(char_ - char_(_a))  ]
            >> lit(_a)
        ; 

        any_string = quoted_string | +qi::alnum;
    }

    qi::rule<Iterator, std::string(), qi::space_type, qi::locals<char> > quoted_string, any_string;
};

int main()
{
    test_parser<std::string::const_iterator> grammar;
    const char* strs[] = { "\"str1\"", 
                           "'str2'",
                           "'str3' trailing ok",
                           "'st\"r4' embedded also ok",
                           "str5",
                           "str6'",
                           NULL };

    for (const char** it = strs; *it; ++it)
    {
        const std::string str(*it);
        std::string::const_iterator iter = str.begin();
        std::string::const_iterator end  = str.end();

        std::string data;
        bool r = phrase_parse(iter, end, grammar, qi::space, data);

        if (r)
            std::cout << "Parsed:    " << str << " --> " << data << "\n";
        if (iter!=end)
            std::cout << "Remaining: " << std::string(iter,end) << "\n";
    }
}

Output:

Parsed:    "str1" --> str1
Parsed:    'str2' --> str2
Parsed:    'str3' trailing ok --> str3
Remaining: trailing ok
Parsed:    'st"r4' embedded also ok --> st"r4
Remaining: embedded also ok
Parsed:    str5 --> str5
Parsed:    str6' --> str6
Remaining: '
Community
  • 1
  • 1
sehe
  • 374,641
  • 47
  • 450
  • 633
  • Thanks, this is exactly what I was after. Would you be able to post a link to any documentation/examples about the locals, it took me a while to notice the `qi::local` in the rule signature, and it would be a good reference for me and anyone else looking at this question. – Zero Apr 26 '12 at 03:25
  • @Zero thanks! And, erm **[`qi::locals`](http://www.boost.org/doc/libs/1_48_0/libs/spirit/doc/html/spirit/qi/reference/parser_concepts/nonterminal.html#spirit.qi.reference.parser_concepts.nonterminal.locals)** was a hyperlink in my answer :) - _click it for documentation_ – sehe Apr 26 '12 at 06:33
  • @Zero For a good sample, I'd refer to the page you linked to in your question, notably here: [One More Take](http://www.boost.org/doc/libs/1_49_0/libs/spirit/doc/html/spirit/qi/tutorials/mini_xml___asts_.html#spirit.qi.tutorials.mini_xml___asts_.one_more_take) – sehe Apr 26 '12 at 06:36
  • Aha, got it - at the bottom of [One More Take](http://www.boost.org/doc/libs/1_49_0/libs/spirit/doc/html/spirit/qi/tutorials/mini_xml___asts_.html#spirit.qi.tutorials.mini_xml___asts_.one_more_take) they talk about the 'locals' template parameter. Thanks again. – Zero Apr 27 '12 at 01:02
  • Slightly improved parsing of string literal (accepting any text within the quotes). Now also with fixed test – sehe Apr 30 '12 at 21:42
  • Thanks for the edit - I was battling with the quoted_string rule just yesterday, although the answer was so obvious I prefer to put it down to lack of coffee than lack of intelligence ;) – Zero Apr 30 '12 at 22:29