2

The following parser handles strings such as "a \"quoted\" string", but strips out the escaped quotes, leaving "a quoted string". Why, and is it possible to prevent it doing that or is this the only way?

template <typename IteratorT, typename SkipperT>
struct quoted_string_grammar
    : qi::grammar<IteratorT, std::string(), SkipperT >
{
    quoted_string_grammar()
        : quoted_string_grammar::base_type(rule, "String")
    {
        using namespace qi;

        rule %= lexeme [
            lit(L'"')
            >> *(lit("\\\"") | (char_ - char_('"')))
            >  lit('"')
        ];
    }

    qi::rule<IteratorT, std::string(), SkipperT> rule;
};

\

sehe
  • 374,641
  • 47
  • 450
  • 633
Keith M
  • 101
  • 8

1 Answers1

0

In principle, you want the escapes to be interpreted during parsing.

Very rare exceptions would include when you intend to "only validate" and forward the same input. However, if that's the case then you wouldn't want any attributes (which is simple in Spirit: just don't pass one).

Also, it's a security smell because you should probably never trust your input.

There's a some other weirdness:

  • you have a grammar with a skipper, and then the only rule is fully lexeme (see Boost spirit skipper issues).
  • you handle \" but \ has no magic meaning otherwise. That's confusing.
  • you have redundant lit() wrapping the character literals
  • char_ - char_('"') could (should?) be written more efficiently as ~char_('"')
  • there's a stray wide-character literal

Collapsing all these issues, I'd write the whole thing as

qi::rule<Iterator, std::string()> rule;
rule = '"' >> *~char_('"') >> '"';

With escapes, I'd write

rule = '"' >> *('\\' >> char_ | ~char_('"')) >> '"';

To expose the raw input:

rule = raw['"' >> *('\\' >> char_ | ~char_('"')) >> '"'];

And you can drop the entire grammar struct.

Illustrative Demo

No answer is complete without a live demo. In particular it hightlights a few of the noted oddities above.

Live On Coliru

#include <boost/spirit/include/qi.hpp>
#include <iomanip>
namespace qi = boost::spirit::qi;

std::string parse(std::string const& input) {
    std::string result;
    static const qi::rule<std::string::const_iterator, std::string()> rule
        = '"' >> *('\\' >> qi::char_ | ~qi::char_('"')) >> '"';

    // throws if expectation failures
    qi::parse(input.begin(), input.end(), qi::eps > rule > qi::eoi, result);
    return result;
}

int main() {
    auto sq = [](auto s) { return std::quoted(s, '\''); };
    auto dq = [](auto s) { return std::quoted(s, '"'); };

    for (std::string s : {
            R"("")",
            R"("hello")",
            R"("hello \"world\"! ")",
            R"("hello \'world\'! ")",
    }) {
        std::cout <<    s  << " -> " <<    parse(s)  << "\n";
        std::cout << sq(s) << " -> " << sq(parse(s)) << "\n";
        std::cout << dq(s) << " -> " << dq(parse(s)) << "\n";
        std::cout << "----\n";
    }
}

Prints

"" -> 
'""' -> ''
"\"\"" -> ""
----
"hello" -> hello
'"hello"' -> 'hello'
"\"hello\"" -> "hello"
----
"hello \"world\"! " -> hello "world"! 
'"hello \\"world\\"! "' -> 'hello "world"! '
"\"hello \\\"world\\\"! \"" -> "hello \"world\"! "
----
"hello \'world\'! " -> hello 'world'! 
'"hello \\\'world\\\'! "' -> 'hello \'world\'! '
"\"hello \\'world\\'! \"" -> "hello 'world'! "
----

I'd like for this to be a Zen Koan. And the Koan ends:

The disciple meditated at the output of the code for 37 days and then he walked away enlightened.

sehe
  • 374,641
  • 47
  • 450
  • 633
  • No answer is complete without a [live demo](http://coliru.stacked-crooked.com/a/bfb9eda7cfe3d793). In particular, this one hightlights a few of the noted oddities above. – sehe Apr 25 '21 at 22:06
  • I woke up in the middle of the night and thought d'oh! I should have used the string instead of the lit parser, as I wanted the attribute. However, that got me into more trouble with an infinite loop, so I was relieved to see your elegant solution. As I need everything between matching quotes, I ended up with this: rule = '"' >> *('\\' >> char_ | ~char_('"')) > '"'; Thanks very much for taking the time to create that demo, and explain where I was going wrong! – Keith M Apr 26 '21 at 09:08