1

Using boost spirit x3 to parse escaped ascii strings I came across this answer but am getting an expectation exception. I have changed the expectation operator in the original to the sequence operator to disable the exception in the code below. Running the code it parses the input and assigns the correct value to the attribute but returns false and is not consuming the input. Any ideas what I've done wrong here?

gcc version 10.3.0

boost 1.71

std = c++17

#include <boost/spirit/home/x3.hpp>
#include <string>
#include <iostream>


namespace x3 = boost::spirit::x3;
using namespace std::string_literals;

//changed expectation to sequence
auto const qstring = x3::lexeme['"' >> *(
             "\\n" >> x3::attr('\n')
           | "\\b" >> x3::attr('\b')
           | "\\f" >> x3::attr('\f')
           | "\\t" >> x3::attr('\t')
           | "\\v" >> x3::attr('\v')
           | "\\0" >> x3::attr('\0')
           | "\\r" >> x3::attr('\r')
           | "\\n" >> x3::attr('\n')
           | "\\"  >> x3::char_("\"\\")
           | "\\\"" >> x3::char_('"')
           | ~x3::char_('"')
       ) >> '"'];

int main(int, char**){

    auto const quoted = "\"Hel\\\"lo Wor\\\"ld"s;
    auto const expected = "Hel\"lo Wor\"ld"s;

    std::string result;
    auto first = quoted.begin();
    auto const last = quoted.end();
    bool ok = x3::phrase_parse(first, last, qstring, x3::ascii::space, result);
    std::cout << "parse returned " << std::boolalpha << ok << '\n';

    std::cout << result << " == " << expected << " is " << std::boolalpha << (result == expected) << '\n';

    std::cout << "first == last = " << (first == last) << '\n';
    std::cout << "first = " << *first << '\n';

    return 0;
}
systemcpro
  • 856
  • 1
  • 7
  • 15

1 Answers1

1

Your input isn't terminated with a quote character. Writing it as a raw string literal helps:

std::string const qinput   = R"("Hel\"lo Wor\"ld)";

Should be

std::string const qinput   = R"("Hel\"lo Wor\"ld")";

Now, the rest is common container handling: in Spirit, when a rule fails (also when it just backtracks a branch) the container attribute is not rolled back. See e.g. boost::spirit::qi duplicate parsing on the output, Understanding Boost.spirit's string parser, etc.

Basically, you cannot rely on the result if the parse failed. This is likely why the original had an expectation point: to raise an exception.

A full demonstration of the correct working:

Live On Coliru

#include <boost/spirit/home/x3.hpp>
#include <string>
#include <iostream>
#include <iomanip>

namespace x3 = boost::spirit::x3;

auto escapes = []{
    x3::symbols<char> sym;
    sym.add
        ("\\b", '\b')
        ("\\f", '\f')
        ("\\t", '\t')
        ("\\v", '\v')
        ("\\0", '\0')
        ("\\r", '\r')
        ("\\n", '\n')
        ("\\\\", '\\')
        ("\\\"", '"')
        ;
    return sym;
}();

auto const qstring = x3::lexeme['"' >> *(escapes | ~x3::char_('"')) >> '"'];

int main(){
    auto squote = [](std::string_view s) { return std::quoted(s, '\''); };
    std::string const expected = R"(Hel"lo Wor"ld)";

    for (std::string const qinput : {
        R"("Hel\"lo Wor\"ld)", // oops no closing quote
        R"("Hel\"lo Wor\"ld")",
        "\"Hel\\\"lo Wor\\\"ld\"", // if you insist
        R"("Hel\"lo Wor\"ld" trailing data)",
    })
    {
        std::cout << "\n -- input " << squote(qinput) << "\n";
        std::string result;

        auto first = cbegin(qinput);
        auto last  = cend(qinput);
        bool ok    = x3::phrase_parse(first, last, qstring, x3::space, result);

        ok &= (first == last);

        std::cout << "parse returned " << std::boolalpha << ok << "\n";

        std::cout << squote(result) << " == " << squote(expected) << " is "
                  << (result == expected) << "\n";

        if (first != last)
            std::cout << "Remaining input unparsed: " << squote({first, last})
                      << "\n";
    }
}

Prints

 -- input '"Hel\\"lo Wor\\"ld'
parse returned false
'Hel"lo Wor"ld' == 'Hel"lo Wor"ld' is true
Remaining input unparsed: '"Hel\\"lo Wor\\"ld'

 -- input '"Hel\\"lo Wor\\"ld"'
parse returned true
'Hel"lo Wor"ld' == 'Hel"lo Wor"ld' is true

 -- input '"Hel\\"lo Wor\\"ld"'
parse returned true
'Hel"lo Wor"ld' == 'Hel"lo Wor"ld' is true

 -- input '"Hel\\"lo Wor\\"ld" trailing data'
parse returned false
'Hel"lo Wor"ld' == 'Hel"lo Wor"ld' is true
Remaining input unparsed: 'trailing data'
sehe
  • 374,641
  • 47
  • 450
  • 633
  • In my update I gratuitously replaced the branching rule with a `synbols` lookup (Trie) and also removed the duplication of the `\\n` branch. – sehe Aug 17 '21 at 23:28
  • 1
    Thank you for that. Couldn't see the woods for the trees there. One takeaway I got from this is to use facilities (literals, quote etc.) provided by the standard library and not fool around with manually doing this sort of thing. It becomes unreadable and confusing. The "gratuitous" replacement with the symbol table works well as it is my preferred solution and now I only need to cut and paste :). Thank you once again. – systemcpro Aug 18 '21 at 07:27