3

I would like to use boost spirit to parse a single value that can have multiple types; e.g. something like:

singleValueToBeParsed %= (double_ | int_ | bool_ | genericString);

genericString   %= +(char_("a-zA-Z"));

The case for parsing to either an int or double seems fairly straight forward:

Parse int or double using boost spirit (longest_d)

..but I am uncertain how to extend this to incorporate other types including generic strings and bools..

Any ideas?

Thanks,

Ben.

EDIT: So based on the answer, I have updated my grammar as follows:

genericString   %= +(char_("a-zA-Z"));
intRule         %= int_;

doubleRule      %= (&int_ >> (double_ >> 'f'))
                | (!int_ >> double_ >> -lit('f'));

boolRule        %= bool_;

where each qi rule has a string, int, double or bool iterator

Then I have a rule

        add    %= string("add") >> '('
               >> (intRule | doubleRule | genericString) >> ','
               >> (intRule | doubleRule | genericString) >> ','
               >> genericString
               >> ')' >> ';';

which expects to take the syntax add(5, 6.1, result); or add(a, b, result); but so far its only parsing if the first two parameters are integers.

Note the add rule is specified as:

qi::rule<Iterator, Function(), ascii::space_type> add;

And Function is specified as:

typedef boost::any DirectValue;

struct Function
{
    //
    // name of the Function; will always be a string
    //
    std::string name;

    //
    // the function parameters which can be initialized to any type
    //
    DirectValue paramA;
    DirectValue paramB;
    DirectValue paramC;
    DirectValue paramD;
    DirectValue paramE;
};

BOOST_FUSION_ADAPT_STRUCT(
    Function,
    (std::string, name)
    (DirectValue, paramA)
    (DirectValue, paramB)
    (DirectValue, paramC)
    (DirectValue, paramD)
    (DirectValue, paramE)
)

EDIT 2:

Now its parsing correctly. See http://liveworkspace.org/code/3asg0X%247 courtesy of llonesmiz. Cheers.

Community
  • 1
  • 1
Ben J
  • 1,367
  • 2
  • 15
  • 33
  • I don't know if it's the only problem, but `doubleRule` needs to be before `intRule` in your `add` rule. –  Mar 05 '13 at 12:14
  • have you tried enabling `#define BOOST_SPIRIT_DEBUG` to see what decisions are being made? – sehe Mar 05 '13 at 12:21
  • @sehe http://liveworkspace.org/code/3asg0X$6 –  Mar 05 '13 at 12:29
  • 2
    [This](http://liveworkspace.org/code/3asg0X$7) seems to work. I've changed `doubleRule`, the order of the rules in the alternative operator in `start`(your `add`). I've also changed your any with a variant because any doesn't work by default with operator <<. –  Mar 05 '13 at 12:40
  • Would you look at that Spirit library go. It's a pretty amazing thing, don't you think – sehe Mar 05 '13 at 13:32
  • Perfect! Thank you! @sehe -- indeed its a beautiful thing! :-) – Ben J Mar 05 '13 at 13:38

1 Answers1

4

This is a fun exercise.

Of course, everything depends on the input grammar, which you conveniently fail to specify.

However, let's for the sake of demonstration assume a literals grammar (very) loosely based on C++ literals, we could come up with the following to parse decimal (signed) integral values, floating point values, bool literals and simplistic string literals:

typedef boost::variant<
    double, unsigned int, 
    long, unsigned long, int, 
    bool, std::string> attr_t;

// ...

start = 
    (
        // number formats with mandatory suffixes first
        ulong_rule | uint_rule | long_rule | 
        // then those (optionally) without suffix
        double_rule | int_rule | 
        // and the simple, unambiguous cases
        bool_rule | string_rule
    );

double_rule = 
         (&int_ >> (double_ >> 'f'))     // if it could be an int, the suffix is required
       | (!int_ >> double_ >> -lit('f')) // otherwise, optional
       ;   
int_rule    = int_;
uint_rule   = uint_ >> 'u' ;
long_rule   = long_ >> 'l' ;
ulong_rule  = ulong_ >> "ul" ;
bool_rule   = bool_;
string_rule = '"' >> *~char_('"') >> '"';

See the linked live demonstration for the output of the test cases: http://liveworkspace.org/code/goPNP

Note Only one test input ("invalid") is supposed to fail. The rest should parse into a literal, optionally leaving unparsed remaining input.

Full Demonstration With Tests

#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/karma.hpp>

namespace qi    = boost::spirit::qi;
namespace karma = boost::spirit::karma;

typedef boost::variant<double, unsigned int, long, unsigned long, int, bool, std::string> attr_t;

template <typename It, typename Skipper = qi::space_type>
    struct parser : qi::grammar<It, attr_t(), Skipper>
{
    parser() : parser::base_type(start)
    {
        using namespace qi;

        start = 
            (
                // number formats with mandatory suffixes first
                ulong_rule | uint_rule | long_rule | 
                // then those (optionally) without suffix
                double_rule | int_rule | 
                // and the simple, unambiguous cases
                bool_rule | string_rule
            );

        double_rule = 
                 (&int_ >> (double_ >> 'f'))     // if it could be an int, the suffix is required
               | (!int_ >> double_ >> -lit('f')) // otherwise, optional
               ;   
        int_rule    = int_;
        uint_rule   = uint_ >> 'u' ;
        long_rule   = long_ >> 'l' ;
        ulong_rule  = ulong_ >> "ul" ;
        bool_rule   = bool_;
        string_rule = '"' >> *~char_('"') >> '"';

        BOOST_SPIRIT_DEBUG_NODE(start);
        BOOST_SPIRIT_DEBUG_NODE(double_rule);
        BOOST_SPIRIT_DEBUG_NODE(ulong_rule);
        BOOST_SPIRIT_DEBUG_NODE(long_rule);
        BOOST_SPIRIT_DEBUG_NODE(uint_rule);
        BOOST_SPIRIT_DEBUG_NODE(int_rule);
        BOOST_SPIRIT_DEBUG_NODE(bool_rule);
        BOOST_SPIRIT_DEBUG_NODE(string_rule);
    }

  private:
    qi::rule<It, attr_t(), Skipper> start;
    // no skippers in here (important):
    qi::rule<It, double()>        double_rule;
    qi::rule<It, int()>           int_rule;
    qi::rule<It, unsigned int()>  uint_rule;
    qi::rule<It, long()>          long_rule;
    qi::rule<It, unsigned long()> ulong_rule;
    qi::rule<It, bool()>          bool_rule;
    qi::rule<It, std::string()>   string_rule;
};

struct effective_type : boost::static_visitor<std::string> {
    template <typename T>
        std::string operator()(T const& v) const {
            return typeid(v).name();
        }
};

bool testcase(const std::string& input)
{
    typedef std::string::const_iterator It;
    auto f(begin(input)), l(end(input));

    parser<It, qi::space_type> p;
    attr_t data;

    try
    {
        std::cout << "parsing '" << input << "': ";
        bool ok = qi::phrase_parse(f,l,p,qi::space,data);
        if (ok)   
        {
            std::cout << "success\n";
            std::cout << "parsed data: " << karma::format_delimited(karma::auto_, ' ', data) << "\n";
            std::cout << "effective typeid: " << boost::apply_visitor(effective_type(), data) << "\n";
        }
        else      std::cout << "failed at '" << std::string(f,l) << "'\n";

        if (f!=l) std::cout << "trailing unparsed: '" << std::string(f,l) << "'\n";
        std::cout << "------\n\n";
        return ok;
    } catch(const qi::expectation_failure<It>& e)
    {
        std::string frag(e.first, e.last);
        std::cout << e.what() << "'" << frag << "'\n";
    }

    return false;
}

int main()
{
    for (auto const& s : std::vector<std::string> {
            "1.3f",
            "0.f",
            "0.",
            "0f",
            "0", // int will be preferred
            "1u",
            "1ul",
            "1l",
            "1",
            "false",
            "true",
            "\"hello world\"",
            // interesting cases
            "invalid",
            "4.5e+7f",
            "-inf",
            "-nan",
            "42 is the answer", // 'is the answer' is simply left unparsed, it's up to the surrounding grammar/caller
            "    0\n   ",       // whitespace is fine
            "42\n.0",           // but not considered as part of a literal
            })
    {
        testcase(s);
    }
}
sehe
  • 374,641
  • 47
  • 450
  • 633
  • +1 Great answer. Implicit lexeme rules are so much better than having to put `lexeme` everywhere. I believe [`qi::bool_`](http://www.boost.org/libs/spirit/doc/html/spirit/qi/reference/numeric/boolean.html) with the default policy behaves as your `bool_rule`. –  Mar 05 '13 at 06:28
  • Oh aha. I missed a spot removing the `lexeme` (I did remove them from the full sample) :) Also, I didn't actually _know_ there was `qi::bool_`. Fixing... – sehe Mar 05 '13 at 08:45
  • Based on your answer I have added more clarification to my question. Its still not parsing correctly (see edits). – Ben J Mar 05 '13 at 11:50