1

I'm trying to parse a quoted string with escape sequences using Boost::Spirit. Unfortunately, it seems that including the quotes in the grammar definition causes massive(-ly unhelpful) compile-time errors (as one might expect with Boost). Omitting quotes lets the program compile, but obviously it won't behave as it's supposed to. This is the code (actually part of a bigger picture, but it demonstrates the issue):

#include "boost/spirit/include/qi.hpp"
#include "boost/proto/deep_copy.hpp"
#include "boost/optional.hpp"

#include <string>

using boost::spirit::qi::char_;
using boost::spirit::qi::lexeme;
using boost::proto::deep_copy;


auto string_literal = deep_copy(
    lexeme[
            // char_('"')
            /* >> */ *((char_ - '"' - '\\') | (char_('\\') >> char_))
            // >> char_('"')
          ]);


template <class Iterator, class Grammar>
boost::optional<std::string> parse_string(Iterator first, Iterator last, Grammar&& gr)
{
    using boost::spirit::qi::space;
    using boost::spirit::qi::phrase_parse;

    std::string temp;
    bool success = phrase_parse(
        first,
        last,
        gr,
        space,
        temp
    );

    if (first == last && success)
        return temp;
    else return boost::none;
}


int main()
{
    std::string str;
    std::cout << "string_literal: ";

    getline(std::cin, str);

    auto presult = parse_string(str.begin(), str.end(), string_literal);
    if (presult) {
        std::cout << "parsed: " << *presult;
    } else
        std::cout << "failure\n";

    return 0;
}

Uncommenting the commented parts of string_literal's definition causes errors. In its current state (with comments) the code compiles. I've tried several things such as moving the quotes into parse_string, as well as using a less specific definition (the one above is the least specific I could come up with that was still useful, the correct grammar is in the OCaml language manual, but I figured I can just validate escape sequences separately), but nothing worked.

My Boost version is 1.56.0, and my compiler is MinGW-w64 g++ 4.9.1. Any help at all most appreciated.

More Axes
  • 249
  • 3
  • 12

1 Answers1

1

It took me a little while to see it.

The problem is - ultimately - with the fact that[1]

(qi::char_('\\') >> qi::char_) | (qi::char_ - '"')

synthesizes to

boost::variant<
    boost::fusion::vector2<char, char>,
    char>

and not, as you likely expected char or std::vector<char>. The attribute compatibility rules of Spirit are near-magic and they let you get away with it (that's pretty damn nifty) but it also hid the problem from your consciousness.

Only to complain about it when you further complicated the rule.

Now I can see two possible routes: Either you want to return the de-escaped string value (without the quotes) and you change it to:[2]

    qi::lexeme [
            '"' >>
                *(('\\' >> qi::char_) | (qi::char_ - '"'))
            >> '"'
        ]

Or you want to capture the raw string with quotes and you don't care about the exposed attributes at all:

    qi::raw [
            '"' >>
                *(('\\' >> qi::char_) | (qi::char_ - '"'))
            >> '"'
        ]

The latter uses the implicit attribute transformation from source-iterator pair (qi::raw[]) to std::string (the bound attribute).

See the full thing live:

Live On Coliru

#include <boost/spirit/include/qi.hpp>
#include <boost/proto/deep_copy.hpp>
#include <boost/optional.hpp>

#include <string>

namespace qi = boost::spirit::qi;

namespace {

    auto string_literal = boost::proto::deep_copy(
#if 1
        qi::lexeme [
                '"' >>
                    *(('\\' >> qi::char_) | (qi::char_ - '"'))
                >> '"'
            ]
#else
        qi::raw [
                '"' >>
                    *(('\\' >> qi::char_) | (qi::char_ - '"'))
                >> '"'
            ]
#endif
        );

}

template <class Iterator, class Grammar>
boost::optional<std::string> parse_string(Iterator first, Iterator last, Grammar&& gr)
{
    std::string temp;

    bool success = qi::phrase_parse(
        first,
        last,
        std::forward<Grammar>(gr),
        qi::space,
        temp
    );

    if (success && first == last)
        return temp;
    else return boost::none;
}


int main()
{
    std::string str;
    std::cout << "string_literal: ";

    getline(std::cin, str);

    auto presult = parse_string(str.begin(), str.end(), string_literal);
    if (presult) {
        std::cout << "parsed: '" << *presult << "'\n";
    } else
        std::cout << "failure\n";

    return 0;
}

[1] slightly simplified by reordering branches

[2] (note that '\\' is equivalent to qi::lit('\\') by implicit conversions of the expression template operands)

sehe
  • 374,641
  • 47
  • 450
  • 633
  • Thank you. I'm honestly not sure what to think of Spirit anymore, all these gotchas are getting tiring and the documentation is rather lacking, I think. Is this something I would have known had I read the documentation from cover to cover? Regardless, this was very helpful - I'll most likely go with the `qi::raw[]` approach since my current design treats all literals as raw strings initially and then uses a separate function to "extract" the actual values from them. – More Axes Dec 25 '14 at 01:20
  • I guess you would have known: I just combined the http://tinyurl.com/alternative-attributes with http://tinyurl.com/sequence-attributes docs to arrive at `variant, char>` there. If you like there are **[ways to detect it](http://stackoverflow.com/questions/9404189/detecting-the-parameter-types-in-a-spirit-semantic-action)**. That said, yes, Spirit has its pitfalls (although you're pushing them a bit by not adhering to the common idioms, e.g. using `deep_copy` that way is skirting the edges of what's possible in Spirit v2; I liked the resulting style, by the way). – sehe Dec 25 '14 at 01:25
  • It took me about 10 minutes to actually see the problem here, so you won't hear me say there's no issue. Then again, I know spirit quite well, and I would not have written this bug myself (I'd realize the effect of `char_('\\')` on synthesized attributes while writing it). So maybe you just have to give it some time. There's definitely trade-offs and I've made cases _againt_ Spirit in the past. But I still view it as a very valuable tool for some jobs. I wouldn't want to be without it. – sehe Dec 25 '14 at 01:28
  • In my defense, the `deep_copy` is only there to save some typing, since I don't really feel like I'm well-versed enough in Spirit to do it the idiomatic way. Which, I assume, is to make a `struct` inheriting from `qi::grammar`, with a `qi::rule` field and a default constructor, I think. Could you provide an example of what might an idiomatic `string_literal` parser look like? – More Axes Dec 25 '14 at 01:31
  • The main purpose of having the wrapping struct is to allow instantiation with different iterator types: **[like so](http://coliru.stacked-crooked.com/a/c8d845b9b790607a)**. So you could just treat it as a "container of rule(s)" (non-idiomatic) or pass a template template argument (non-idiomatic). I'd stick with the `grammar` (first approach). – sehe Dec 25 '14 at 01:50
  • They're both non-idiomatic? – More Axes Dec 25 '14 at 01:53
  • @MoreAxes The linked sample is an idiomatic sample for the grammar as you requested. The alternatives (not shown) would be non-idiomatic. Note that Spirit X3 will not require this kind of "red tape" anymore (neither will it require the `deep_copy` hack anymore, so you can freely `auto` declare everything you want). – sehe Dec 25 '14 at 01:56
  • I've heard about X3 simplifying things a lot, but its github repository refers to itself as "experimental", which is less than encouraging. Anyway, thank you again for your time, this was very helpful. – More Axes Dec 25 '14 at 02:01
  • @MoreAxes I was just saying it in case you want to consider postponing learning more until Spirit X3 becomes a thing (that might be years though). Cheers – sehe Dec 25 '14 at 02:03