1

i try to get a immediate rule for string, int and float so i can parse the following tests

 //strings
 "\"hello\"",
 "   \"  hello \"  ",
 "  \"  hello \"\"stranger\"\" \"  ",
 //ints
 "1",
 "23",
 "456",
 //floats
 "3.3",
 "34.35"

try online: http://coliru.stacked-crooked.com/a/26fbd691876d9a8f

using

qi::rule<std::string::const_iterator, std::string()> 
  double_quoted_string = '"' >> *("\"\"" >> qi::attr('"') | ~qi::char_('"')) >> '"';

qi::rule<std::string::const_iterator, std::string()> 
  number = (+qi::ascii::digit >> *(qi::char_('.') >> +qi::ascii::digit));

qi::rule<std::string::const_iterator, std::string()>
  immediate = double_quoted_string | number;

gives me the correct result - but i need to use the double_ parse because i want to support eponential notation, NaN etc.

but using

qi::rule<std::string::const_iterator, std::string()>
  immediate = double_quoted_string | qi::uint_ | qi::double_;

prints for the integer values

"1" OK: ''
----
"23" OK: ''
----
"456" OK: '�'

and the double numbers failing completely to parse

tested under Coliru, Win7x64 VS2017 latest, LLVM clang-cl

sometimes Colliru gives too much warnings and the compilation is halted

any idea what happens here?

do warnings in spirit often mean - stop here, something severely broken?

UPDATE: it also happen if i only use double_, before i tested it and the behavior changed with/without the uint_ parser try: https://wandbox.org/permlink/UqgItWkfC2I8tkNF

sehe
  • 374,641
  • 47
  • 450
  • 633
llm
  • 557
  • 3
  • 15

2 Answers2

1

Use qi::raw on integer and double floating point parsers so that the numbers are converted lexically: qi::raw[qi::uint_] and qi::raw[qi::double_].

But also the order of parsing is important. If uint_ parser is before double_ like here:

immediate = double_quoted_string | qi::raw[qi::uint_] | qi::raw[qi::double_];
BOOST_SPIRIT_DEBUG_NODES((immediate)); // for debug output

then the uint_ parser will partially consume the double floating point number and then the whole parsing will fail:

<immediate>
  <try>34.35</try>
  <success>.35</success> //<----- this is what is left after uint_ parsed
  <attributes>[[3, 4]]</attributes> // <---- what uint_ parser successfully parsed
</immediate>
"34.35" Failed
Remaining unparsed: "34.35"

After swapping order of uint_ with double_:

immediate = double_quoted_string | qi::raw[qi::double_] | qi::raw[qi::uint_];

The result:

"\"hello\"" OK: 'hello'
----
"   \"  hello \"  " OK: '  hello '
----
"  \"  hello \"\"stranger\"\" \"  " OK: '  hello "stranger" '
----
"1" OK: '1'
----
"64" OK: '64'
----
"456" OK: '456'
----
"3.3" OK: '3.3'
----
"34.35" OK: '34.35'
----
doqtor
  • 8,414
  • 2
  • 20
  • 36
0

A loose definition of "parsing" would be to transform textual representation to "another" (often, more native) representation.

It doesn't really make sense to "parse" a number into a std::string. What you're seeing is automatic attribute propagation trying very hard to make sense of it (by sticking the parsed number into a string as a character).

That's not what you wanted. Instead, you want to parse the integer value, or the double value. For this, You could simply declare a variant attribute type:

using V = boost::variant<std::string, double, unsigned int>;
qi::rule<std::string::const_iterator, V()>
    immediate = double_quoted_string | qi::double_ | qi::uint_;

That's it. Live demo, adding type-checks on the result:

Live On Coliru

#include <iostream>
#include <iomanip>
#include <boost/spirit/include/qi.hpp>

namespace qi = boost::spirit::qi;
using namespace std::string_literals;

int main() {
    for (auto&& [str, type] : std::vector {
        std::pair("\"hello\""s,                typeid(std::string).name()),
        {"   \"  hello \"  "s,                 typeid(std::string).name()},
        {"  \"  hello \"\"stranger\"\" \"  "s, typeid(std::string).name()},
        {"1"s,                                 typeid(unsigned int).name()},
        {"23"s,                                typeid(unsigned int).name()},
        {"456"s,                               typeid(unsigned int).name()},
        {"3.3"s,                               typeid(double).name()},
        {"34.35"s,                             typeid(double).name()},
    }) {
        auto iter = str.cbegin(), end = str.cend();

        qi::rule<std::string::const_iterator, std::string()> double_quoted_string
            = '"' >> *("\"\"" >> qi::attr('"') | ~qi::char_('"')) >> '"';

        using V = boost::variant<std::string, double, unsigned int>;
        qi::rule<std::string::const_iterator, V()> immediate
            = double_quoted_string | qi::double_ | qi::uint_;

        std::cout << std::quoted(str) << " ";

        V res;
        bool r = qi::phrase_parse(iter, end, immediate, qi::blank, res);
        bool typecheck = (type == res.type().name());

        if (r) {
            std::cout << "OK: " << res << " typecheck " << (typecheck?"MATCH":"MISMATCH") << "\n";
        } else {
            std::cout << "Failed\n";
        }
        if (iter != end) {
            std::cout << "Remaining unparsed: " << std::quoted(std::string(iter, end)) << "\n";
        }
        std::cout << "----\n";
    }
}

Prints

"\"hello\"" OK: hello typecheck MATCH
----
"   \"  hello \"  " OK:   hello  typecheck MATCH
----
"  \"  hello \"\"stranger\"\" \"  " OK:   hello "stranger"  typecheck MATCH
----
"1" OK: 1 typecheck MISMATCH
----
"23" OK: 23 typecheck MISMATCH
----
"456" OK: 456 typecheck MISMATCH
----
"3.3" OK: 3.3 typecheck MATCH
----
"34.35" OK: 34.35 typecheck MATCH
----

Note the re-ordering of uint_ after double_. If you parse integers first, it will parse the integer part of a double until the decimal separator, and then fail to parse the rest. To be more accurate, you may want to use a strict real parser so that only number that actual have a fraction get parsed as doubles. This does limit the range for integral numbers because unsigned int has a far smaller range than double.

See Parse int or double using boost spirit (longest_d)

Live On Coliru

    qi::rule<std::string::const_iterator, V()> immediate
        = double_quoted_string
        | qi::real_parser<double, qi::strict_real_policies<double> >{}
        | qi::uint_;

Prints

"\"hello\"" OK: hello typecheck MATCH
----
"   \"  hello \"  " OK:   hello  typecheck MATCH
----
"  \"  hello \"\"stranger\"\" \"  " OK:   hello "stranger"  typecheck MATCH
----
"1" OK: 1 typecheck MATCH
----
"23" OK: 23 typecheck MATCH
----
"456" OK: 456 typecheck MATCH
----
"3.3" OK: 3.3 typecheck MATCH
----
"34.35" OK: 34.35 typecheck MATCH
----
sehe
  • 374,641
  • 47
  • 450
  • 633
  • im still confused about the possibility to have string results that contains - more or less the native type data - i think Spirits compromise between typesafety and flexibility – llm Mar 25 '20 at 07:04
  • On the contrary. *Your* code had `std::string res;`, compromising type safety for flexibility ([Stringly Typed](https://wiki.c2.com/?StringlyTyped)). *My* version has none of the issues by using a variant type (retaining full static typing with static polymorphism, also standardized as [`std::variant` with `std::visit`](https://en.cppreference.com/w/cpp/utility/variant)). If you're new to variant types, it will take a moment, but you'll soon find out that it very different from dynamic typing (that would be more like [`std::any` or `boost::any`](https://en.cppreference.com/w/cpp/utility/any)) – sehe Mar 25 '20 at 08:32
  • `Your code had std::string res` in the end it will be all typesafe- the way you showed - currently im just using string because you've done that before in the examples – llm Mar 25 '20 at 08:39