3

I have a simple parser which can parse lists of ints or quoted strings.

If I do the SIMPLE_CASE where I take the input to be:

std::string input1 = "{ INT: 42, 24 STR: \"Smith\", \"John\" }";

it parses correctly into my_record, which contains a list of ints and a list of std::string.

I want to modify this code to be generic so that it can take zero or more INT lists and zero or more STR lists in arbitrary order and stuff them into my_record in the proper order. I would like my second, more generic test case:

std::string input1 = "{ STR: \"Joe\" INT: 42, 24 STR: \"Smith\", \"John\" }";

to parse as:

client::my_record expected1 { { 42, 24 }, {"Joe", "Smith", "John"} }; 

The code below works fine if I run:

/tmp$ g++ -DSIMPLE_CASE -g -std=c++11 sandbox.cpp -o sandbox && ./sandbox 

but I'm not sure how to get the general case to work when running this:

/tmp$ g++ -g -std=c++11 sandbox.cpp -o sandbox && ./sandbox 

Code for sandbox.cpp

#include <boost/config/warning_disable.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <boost/spirit/include/phoenix_object.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/fusion/include/io.hpp>

#include <string>
#include <complex>
#include <algorithm>

namespace client
{
    namespace qi    = boost::spirit::qi;
    namespace ascii = boost::spirit::ascii;

    struct my_record
    {
        std::vector<int>          m_ints;
        std::vector<std::string>  m_strs;

        bool operator==( const my_record& other ) const
        {
            return std::equal( m_ints.begin(), m_ints.end(), other.m_ints.begin() )
                && std::equal( m_strs.begin(), m_strs.end(), other.m_strs.begin() );
        }
        bool operator!=( const my_record& other ) const
        {
            return ! operator==( other );
        }
        friend std::ostream& operator<<( std::ostream& os, const my_record& rec );
    };

    std::ostream& operator<<( std::ostream& os, const my_record& rec )
    {
        for( const auto& x : rec.m_ints )
            std::cerr << x << ' ';
        std::cerr << std::endl;

        for( const auto& x : rec.m_strs )
            std::cerr << x << ' ';
        std::cerr << std::endl;

    }
}

BOOST_FUSION_ADAPT_STRUCT(
    client::my_record,
        (std::vector<int>,          m_ints)
        (std::vector<std::string>,  m_strs)
)

namespace client
{
    template <typename Iterator>
    struct employee_parser : qi::grammar<Iterator, my_record(), ascii::space_type>
    {
        employee_parser() : employee_parser::base_type(start)
    {
        using qi::int_;
        using qi::lit;
        using qi::double_;
        using qi::lexeme;
        using ascii::char_;

        quoted_string %= lexeme['"' >> +(char_ - '"') >> '"'];

#ifdef SIMPLE_CASE
        start %=
            '{'
            >>  int_list
            >>  str_list
            >>  '}'
            ;
#else
        // not sure how to approach this
        start %=
            '{'
            >>  *(int_list)  // want zero or more of these, in any order
            >>  *(str_list)  // want zero or more of these, in any order
            >>  '}'
            ;
#endif

        str_list %=
                lit( "STR:" ) >> quoted_string % ','    
                ;

        int_list %=
                lit( "INT:" ) >> int_ % ','
                ;
    }

    qi::rule<Iterator, std::string(), ascii::space_type>               quoted_string;
    qi::rule<Iterator, std::vector<std::string>(), ascii::space_type>  str_list;
    qi::rule<Iterator, std::vector<int>(),         ascii::space_type>  int_list;

    qi::rule<Iterator, my_record(), ascii::space_type>                 start;
    };
}

static int 
TryParse( const std::string& input, const client::my_record& expected )
{
    using boost::spirit::ascii::space;
    client::my_record                        rec;
    auto                                     iter = input.begin(), end = input.end();
    client::employee_parser<decltype(iter)>  g;
    phrase_parse( iter, end, g, space, rec );
    if ( iter!=end )
    {
        std::cerr << "failed to parse completely" << std::endl;
        return -1;
    } else if ( rec!=expected ) {
        std::cerr << "unexpected result in parse" << std::endl;
        std::cerr << rec;
        return -1;
    }
    return 0;
}

int 
main(int argc, char* argv[])
{
#ifdef SIMPLE_CASE
    client::my_record  expected1 { { 42, 24 }, {"Smith", "John"} }, emp;
    std::string        input1 = "{ INT: 42, 24 STR: \"Smith\", \"John\" }";
    return TryParse( input1, expected1 );
#else
    client::my_record  expected1 { { 42, 24 }, {"Joe", "Smith", "John"} }, emp;
    std::string        input1 = "{ STR: \"Joe\" INT: 42, 24 STR: \"Smith\", \"John\" }";
    return TryParse( input1, expected1 );
#endif

}
ildjarn
  • 62,044
  • 9
  • 127
  • 211
kfmfe04
  • 14,936
  • 14
  • 74
  • 140

2 Answers2

4

You grammar is wrong,

    start %=
        '{'
        >>  *(int_list)  // want zero or more of these, in any order
        >>  *(str_list)  // want zero or more of these, in any order
        >>  '}'
        ;

This means accept any number of ints followed by any number of string. You can no have int, string, int, or any other combination.

You need something like

    start %=
        '{'
         >> *( int_list  // want zero or more of these, in any order
             | str_list  // want zero or more of these, in any order
             )
        >>  
        '}'
        ;

But obviously you need to plum that into you data structure, bewarned you may have to use semantic actions.

ALSO:

whilst I am here, I can't let this slide:

    std::ostream& operator<<( std::ostream& os, const my_record& rec )
    {
        for( const auto& x : rec.m_ints )
            std::cerr << x << ' ';
        std::cerr << std::endl;

        for( const auto& x : rec.m_strs )
            std::cerr << x << ' ';
        std::cerr << std::endl;

    }

should be straeming to os like:

        for( const auto& x : rec.m_ints )
            os << x << ' ';
        os << '\n';

Also try and avoid endling in stream insertion operator, use \n if you need a new line.

THE SOLUTION:

What was need in the end was to use phoenix functions, push_back and a binder.

template<typename Iterator>
struct my_grammar 
: qi::grammar<Iterator, my_record(), ascii::space_type> {

    my_grammar() 
    : my_grammar::base_type(start) {

        quoted_string %= qi::lexeme['"' >> +(qi::char_ - '"') >> '"'];

        start = qi::lit("{")
                >>
                *( "INT:" >> qi::int_     
                    [ 
                        phx::push_back(
                            phx::at_c<0>(
                                qi::_val
                            ), 
                            qi::_1
                        ) 
                    ] % ","
                 | "STR:" >> quoted_string
                     [ 
                        phx::push_back(
                            phx::bind(
                                &my_record::m_strs,
                                qi::_val
                            ), 
                            qi::_1
                        ) 
                    ] % ","
                 )
                >> 
                "}"
                 ;
    }
    qi::rule<Iterator, std::string(), ascii::space_type> quoted_string;
    qi::rule<Iterator, my_record(),   ascii::space_type>   start;
};

The whole code listing can be seen here:

http://ideone.com/XW18Z2

Community
  • 1
  • 1
111111
  • 15,686
  • 6
  • 47
  • 62
  • yes, I know it's broken - I need a suggestion on how to do it right and that's why I posted... ...you got me thinking: I wonder if the permutation operator would work... (do I have to worry about semantic actions then?) – kfmfe04 Nov 27 '12 at 19:53
  • hehe - nice fix on that - lost train-of-thought while coding@2:30AM. gotta return os; there too... – kfmfe04 Nov 27 '12 at 19:56
  • @kfmfe04 did the new grammar work without SAs or are you on about the ostream? – 111111 Nov 27 '12 at 19:56
  • trying out new grammar first and then going to think about semantic actions (new grammar does make sense) – kfmfe04 Nov 27 '12 at 19:57
  • Yes, see if the new grammar works without actually trying to build a data structure (leave the output type blank) – 111111 Nov 27 '12 at 19:58
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/20193/discussion-between-kfmfe04-and-111111) – kfmfe04 Nov 27 '12 at 20:02
  • SPACE skipping version added – 111111 Nov 27 '12 at 21:50
4

An alternative using is_container and push_back_container instead of semantic actions:

Step1: remove your BOOST_FUSION_ADAPT_STRUCT macro.

Step2: change your start rule.

start %=
            '{'
            >>  *(int_list // want zero or more of these, in any order
                | str_list)  // want zero or more of these, in any order
            >>  '}'
            ;

Step3: Add the following specializations.

namespace boost { namespace spirit { namespace traits
{
    template <>
    struct is_container<client::my_record>: mpl::true_//my_record acts as a container
    {};

    template <>
    struct container_value<client::my_record>
    {
        typedef boost::variant<std::vector<int>,std::vector<std::string>> type;//The elements to add to that container are either vector<int> or vector<string>
    };


    template <>
    struct push_back_container<client::my_record,std::vector<int>>//when you add a vector of ints...
    {
        static bool call(client::my_record& c, std::vector<int> const& val)
        {
            c.m_ints.insert(c.m_ints.end(),val.begin(), val.end());//insert it at the end of your acumulated vector of ints
            return true;
        }
    };

    template <>
    struct push_back_container<client::my_record,std::vector<std::string>>//when you add a vector of strings
    {
        static bool call(client::my_record& c, std::vector<std::string> const& val)//insert it at the end of your acumulated vector of strings
        {
            c.m_strs.insert(c.m_strs.end(),val.begin(),val.end());
            return true;
        }
    };

}}}

Here is the complete code as requested (compiles with g++ 4.7.1 and msvc11 if I create the expected result using several push_backs):

Updated the example to add another member vector of adapted structs.

#include <boost/config/warning_disable.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/adapted/struct.hpp>


#include <string>
#include <vector>
#include <iostream>

namespace client
{
    struct my_subrec
    {
        double foo;
        double bar;
        bool operator==( const my_subrec& other ) const
        {
            return foo==other.foo && bar==other.bar;
        }
    };

    std::ostream& operator<<( std::ostream& os, const my_subrec& rec )
    {
        os << rec.foo << "->" << rec.bar;
        return os;
    }   

}

BOOST_FUSION_ADAPT_STRUCT(client::my_subrec,
                (double, foo)
                (double, bar)
                )


namespace client
{
    namespace qi    = boost::spirit::qi;
    namespace ascii = boost::spirit::ascii;


    struct my_record
    {
        std::vector<int>          m_ints;
        std::vector<std::string>  m_strs;
        std::vector<my_subrec>    m_recs;

        bool operator==( const my_record& other ) const 
        {
            return std::equal( m_ints.begin(), m_ints.end(), other.m_ints.begin() )
                && std::equal( m_strs.begin(), m_strs.end(), other.m_strs.begin() )
                && std::equal( m_recs.begin(), m_recs.end(), other.m_recs.begin() );
        }
        bool operator!=( const my_record& other ) const
        {
            return ! operator==( other );
        }
        friend std::ostream& operator<<( std::ostream& os, const my_record& rec );
    };

    std::ostream& operator<<( std::ostream& os, const my_record& rec ) 
    {
        for( const auto& x : rec.m_ints )
            os << x << ' ';
        os << '\n';

        for( const auto& x : rec.m_strs )
            os << x << ' ';
        os << '\n';

        for( const auto& x : rec.m_recs )
            os << x << ' ';
        return os;
    }
}

//BOOST_FUSION_ADAPT_STRUCT(
//    client::my_record,
//        (std::vector<int>,          m_ints)
//        (std::vector<std::string>,  m_strs)
//)


namespace client
{
    template <typename Iterator>
    struct employee_parser : qi::grammar<Iterator, my_record(), ascii::space_type>
    {
        employee_parser() : employee_parser::base_type(start)
    {
        using qi::int_;
        using qi::lit;
        using qi::double_;
        using qi::lexeme;
        using ascii::char_;

        quoted_string %= lexeme['"' >> +(char_ - '"') >> '"'];

#ifdef SIMPLE_CASE
        start %=
            '{'
            >>  int_list
            >>  str_list
            >>  '}'
            ;
#else
        // not sure how to approach this
        start %=
            '{'
            >>  *(int_list // want zero or more of these, in any order
                | str_list  // want zero or more of these, in any order
                | rec_list)
            >>  '}'
            ;
#endif

        str_list %=
                lit( "STR:" ) >> quoted_string % ','    
                ;

        int_list %=
                lit( "INT:" ) >> int_ % ','
                ;
        rec_list =
                lit( "REC:" ) >> rec % ','
                ;
        rec = double_ >> lit('-') >> double_
                ;
    }

    qi::rule<Iterator, std::string(), ascii::space_type>               quoted_string;
    qi::rule<Iterator, std::vector<std::string>(), ascii::space_type>  str_list;
    qi::rule<Iterator, std::vector<int>(),         ascii::space_type>  int_list;
    qi::rule<Iterator, client::my_subrec(), ascii::space_type> rec;
    qi::rule<Iterator, std::vector<client::my_subrec>(),ascii::space_type> rec_list;

    qi::rule<Iterator, my_record(), ascii::space_type>                 start;
    };
}

namespace boost { namespace spirit { namespace traits
{
    template <>
    struct is_container<client::my_record>: mpl::true_//my_record acts as a container
    {};

    template <>
    struct container_value<client::my_record>
    {
        typedef boost::variant<std::vector<int>,std::vector<std::string>,std::vector<client::my_subrec> >type;
        //The elements to add to that container are vector<int>, vector<string> or vector<my_subrec>
    };


    template <>
    struct push_back_container<client::my_record,std::vector<int>>//when you add a vector of ints...
    {
        static bool call(client::my_record& c, std::vector<int> const& val)
        {
            c.m_ints.insert(c.m_ints.end(),val.begin(), val.end());//insert it at the end of your acumulated vector of ints
            return true;
        }
    };

    template <>
    struct push_back_container<client::my_record,std::vector<std::string>>//when you add a vector of strings
    {
        static bool call(client::my_record& c, std::vector<std::string> const& val)//insert it at the end of your acumulated vector of strings
        {
            c.m_strs.insert(c.m_strs.end(),val.begin(),val.end());
            return true;
        }
    };

    template <>
    struct push_back_container<client::my_record,std::vector<client::my_subrec>>//when you add a vector of subrecs
    {
        static bool call(client::my_record& c, std::vector<client::my_subrec> const& val)//insert it at the end of your acumulated vector of subrecs
        {
            c.m_recs.insert(c.m_recs.end(),val.begin(),val.end());
            return true;
        }
    };

}}}

static int 
TryParse( const std::string& input, const client::my_record& expected )
{
    using boost::spirit::ascii::space;
    client::my_record                        rec;
    auto                                     iter = input.begin(), end = input.end();
    client::employee_parser<decltype(iter)>  g;
    phrase_parse( iter, end, g, space, rec );
    if ( iter!=end )
    {
        std::cerr << "failed to parse completely" << std::endl;
        return -1;
    } else if ( rec!=expected ) {
        std::cerr << "unexpected result in parse" << std::endl;
        std::cerr << rec;
        return -1;
    }
    std::cout << rec << std::endl;
    return 0;
}

int 
main(int argc, char* argv[])
{
#ifdef SIMPLE_CASE
    client::my_record  expected1 { {42, 24 }, {"Smith", "John"} }, emp;
    std::string        input1 = "{ INT: 42, 24 STR: \"Smith\", \"John\" }";
    return TryParse( input1, expected1 );
#else
    client::my_record  expected1 { { 42, 24,240 }, {"Joe", "Smith", "John"}, {{1.5,2.5}} }, emp;

    std::string        input1 = "{ STR: \"Joe\" INT: 42, 24 STR: \"Smith\", \"John\" INT: 240 REC: 1.5-2.5 }";
    return TryParse( input1, expected1 );
#endif

}
  • @kfmfe04 now compiles with g++ 4.7.1 (changed `.cend()` to `.end()`) –  Nov 27 '12 at 22:09
  • This is a very good answer as it keeps the grammar clean: unfortunately, when I try it, I get the dreaded wall-of-cryptic-template-errors from g++. If you managed to get it to build in the context of the original program (or some cleaned up version of it), please post the complete solution so I can adapt it. TYVM. – kfmfe04 Nov 28 '12 at 04:42
  • I've added a new question: http://stackoverflow.com/questions/13598230/how-to-compile-with-boostspirittraits as a follow-up to your suggestion solution. If you could answer there, it'd be great. – kfmfe04 Nov 28 '12 at 05:11
  • ty - you've saved me hours of debugging: I tried adding my own `my_subrec` for the last couple of hours without success, but yours works perfectly. I really appreciate your help - I will study your code with a fine-toothed comb. – kfmfe04 Nov 28 '12 at 10:17
  • @kfmfe04 In case you are interested in seeing another [ab]use of `is_container` in order to sidestep semantic actions, [here](http://ideone.com/szrRZD) you can find a version of the [roman number example](http://www.boost.org/libs/spirit/example/qi/roman.cpp). –  Nov 28 '12 at 11:20
  • interesting you brought that up (I think I saw that example before in the docs) - while researching this problem, I was wondering if using the permutation operator a ^ b could be used to pull down the results... ...still thinking about it. – kfmfe04 Nov 28 '12 at 13:12
  • @kfmfe04 I'm pretty sure that you can with 111111's answer. –  Nov 28 '12 at 13:22
  • I didn't have a look at your version of the roman number example until just now - very clever! BTW, I found this interesting thread: http://boost.2283326.n4.nabble.com/Parsing-a-sequence-in-a-unspecified-order-filling-a-fusion-struct-td3619481.html I lifted Tongari's code which uses the permutation operator as I suspected: http://ideone.com/7Pvtba Too late to play with now, but I will dig some more tomorrow. – kfmfe04 Nov 28 '12 at 19:47
  • Just [referred someone on the mailing list](http://boost.2283326.n4.nabble.com/Omitting-unused-type-inside-Kleene-Star-tp4643647p4643672.html) to this excellent answer. I hope it will give you the well-deserved karma (hah - dat pun)! – sehe Mar 02 '13 at 13:36