1

I want to read a CSV into a struct :

struct data 
{
   std::string a;
   std::string b;
   std::string c;
}

However, I want to read even empty string to ensure all values are in their proper place. I adapted the struct to a boost::fusion, so the following works :

// Our parser (using a custom skipper to skip comments and empty lines )
template <typename Iterator, typename skipper = comment_skipper<Iterator> >
  struct google_parser : qi::grammar<Iterator, addressbook(), skipper>
{
  google_parser() : google_parser::base_type(contacts, "contacts")
  {
    using qi::eol;
    using qi::eps;
    using qi::_1;
    using qi::_val;
    using qi::repeat;
    using standard_wide::char_;
    using phoenix::at_c;
    using phoenix::val;

    value = *(char_ - ',' - eol) [_val += _1];

    // This works but only for small structs
    entry %= value >> ',' >> value >> ',' >> value >> eol;
  }

  qi::rule<Iterator, std::string()> value;
  qi::rule<Iterator, data()> entry;
};

Unfortunately, repeat stores in a vector all non-empty values so the values of attributes may be mixed together (i.e if the field for b is null, it may contains the content from c):

    entry %= repeat(2)[ value >> ','] >> value >> eol;

I would like to use a short rule similar to repeat as my struct has 60 attributes in practice ! Not only is writing 60 rules tedious but it seems Boost does not like long rules...

alex_reader
  • 689
  • 1
  • 5
  • 22
  • I noticed - after writing the answer - you consider the input to be CSV. See [How to parse CSV using Spirit](http://stackoverflow.com/questions/18365463/h/18366335#18366335), and [this other answer](http://stackoverflow.com/questions/7436481/h/7462539#7462539) (and also [an adaptation for zero-copy parsing of a mapped file](http://stackoverflow.com/questions/23699731/s/23703810#23703810)). There's also this one that maps columns: [boost spirit parsing CSV with columns in variable order](http://stackoverflow.com/questions/27967195/b/27967473#27967473). For your inspiration – sehe Mar 09 '15 at 00:55
  • sehe, thank you for in-depth answer. Not only are you extremely clear but you took the trouble to write full examples. People like you make stackoverflow worth it. – alex_reader Mar 09 '15 at 20:35

1 Answers1

2

You just want to make sure you parse a value for "empty" strings too.

value = +(char_ - ',' - eol) | attr("(unspecified)");
entry = value >> ',' >> value >> ',' >> value >> eol;

See the demo:

Live On Coliru

//#define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>

namespace qi = boost::spirit::qi;

struct data {
    std::string a;
    std::string b;
    std::string c;
};

BOOST_FUSION_ADAPT_STRUCT(data, (std::string, a)(std::string, b)(std::string, c))

template <typename Iterator, typename skipper = qi::blank_type>
struct google_parser : qi::grammar<Iterator, data(), skipper> {
    google_parser() : google_parser::base_type(entry, "contacts") {
        using namespace qi;

        value = +(char_ - ',' - eol) | attr("(unspecified)");
        entry = value >> ',' >> value >> ',' >> value >> eol;

        BOOST_SPIRIT_DEBUG_NODES((value)(entry))
    }
  private:
    qi::rule<Iterator, std::string()> value;
    qi::rule<Iterator, data(), skipper> entry;
};

int main() {
    using It = std::string::const_iterator;
    google_parser<It> p;

    for (std::string input : { 
            "something, awful, is\n",
            "fine,,just\n",
            "like something missing: ,,\n",
        })
    {
        It f = input.begin(), l = input.end();

        data parsed;
        bool ok = qi::phrase_parse(f,l,p,qi::blank,parsed);

        if (ok)
            std::cout << "Parsed: '" << parsed.a << "', '" << parsed.b << "', '" << parsed.c << "'\n";
        else
            std::cout << "Parse failed\n";

        if (f!=l)
            std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
    }
}

Prints:

Parsed: 'something', 'awful', 'is'
Parsed: 'fine', '(unspecified)', 'just'
Parsed: 'like something missing: ', '(unspecified)', '(unspecified)'

However, you have a bigger problem. The assumption that qi::repeat(2) [ value ] will parse into 2 strings doesn't work.

repeat, like operator*, operator+ and operator% parse into a container attribute. In this case the container attribute (string) will receive the input from the second value as well:

Live On Coliru

Parsed: 'somethingawful', 'is', ''
Parsed: 'fine(unspecified)', 'just', ''
Parsed: 'like something missing: (unspecified)', '(unspecified)', ''

Since this is not what you want, reconsider your data types:

The auto_ approach:

If you teach Qi how to extract a single value, you can use a simple rule like

entry = skip(skipper() | ',') [auto_] >> eol;

This way, Spirit itself will generate the correct number of value extractions for the given Fusion sequence!

Here's a quick an dirty approach:

CAVEAT Specializing for std::string directly like this might not be the best idea (it might not always be appropriate and might interact badly with other parsers). However, by default create_parser<std::string> is not defined (because, what would it do?) so I seized the opportunity for the purpose of this demonstration:

namespace boost { namespace spirit { namespace traits {
    template <> struct create_parser<std::string> {
        typedef proto::result_of::deep_copy<
            BOOST_TYPEOF(
                qi::lexeme [+(qi::char_ - ',' - qi::eol)] | qi::attr("(unspecified)")
            )
        >::type type;

        static type call() {
            return proto::deep_copy(
                qi::lexeme [+(qi::char_ - ',' - qi::eol)] | qi::attr("(unspecified)")
            );
        }
    };
}}}

Again, see the demo output:

Live On Coliru

Parsed: 'something', 'awful', 'is'
Parsed: 'fine', 'just', '(unspecified)'
Parsed: 'like something missing: ', '(unspecified)', '(unspecified)'

NOTE There was some advanced sorcery to get the skipper to work "just right" (see skip()[] and lexeme[]). Some general explanations can be found here: Boost spirit skipper issues

UPDATE

The Container Approach

There's a subtlety to that. Two actually. So here's a demo:

Live On Coliru

//#define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>

namespace qi = boost::spirit::qi;

struct data {
    std::vector<std::string> parts;
};

BOOST_FUSION_ADAPT_STRUCT(data, (std::vector<std::string>, parts))

template <typename Iterator, typename skipper = qi::blank_type>
struct google_parser : qi::grammar<Iterator, data(), skipper> {
    google_parser() : google_parser::base_type(entry, "contacts") {
        using namespace qi;
        qi::as<std::vector<std::string> > strings;

        value = +(char_ - ',' - eol) | attr("(unspecified)");
        entry = strings [ repeat(2) [ value >> ',' ] >> value ] >> eol;

        BOOST_SPIRIT_DEBUG_NODES((value)(entry))
    }
  private:
    qi::rule<Iterator, std::string()> value;
    qi::rule<Iterator, data(), skipper> entry;
};

int main() {
    using It = std::string::const_iterator;
    google_parser<It> p;

    for (std::string input : { 
            "something, awful, is\n",
            "fine,,just\n",
            "like something missing: ,,\n",
        })
    {
        It f = input.begin(), l = input.end();

        data parsed;
        bool ok = qi::phrase_parse(f,l,p,qi::blank,parsed);

        if (ok) {
            std::cout << "Parsed: ";
            for (auto& part : parsed.parts) 
                std::cout << "'" << part << "' ";
            std::cout << "\n";
        }
        else
            std::cout << "Parse failed\n";

        if (f!=l)
            std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
    }
}

The subtleties are:

Community
  • 1
  • 1
sehe
  • 374,641
  • 47
  • 450
  • 633
  • Added the approach that lets Qi generate the appropriate parser from the adapted fusion struct there: **[`entry = skip(skipper() | ',') [auto_] >> eol;`](http://coliru.stacked-crooked.com/a/30541461bab7e1a1)** – sehe Mar 09 '15 at 00:48
  • I would like to preserve my structure for later uses so I am hesitating between a container and your `auto_` approach. Unfortunately, the documentation about containers is a bit frightening and lack examples. So I more tempte to go towards `auto_` but all of your code seems like black magic to me :) – alex_reader Mar 09 '15 at 20:36
  • I personally recommend simplifying by matching the AST type to the grammar you wish to be using. There are some limitations already in the auto_ approach shown here (what about adjacent delimiters?) that will end the dream of automagic parser generation. – sehe Mar 09 '15 at 20:49
  • The approach with container attributes is simpler in almost all respects. – sehe Mar 09 '15 at 20:50
  • @alex_reader I was on mobile yesterday. I added a a container sample to the answer because there are still some subtleties (see the explanation text). Even with those, it's a ways simpler than the 'hack' with `create_parser` though! – sehe Mar 10 '15 at 07:25
  • Thank you for your hard work ! The container approach works well and is quite elegant too :) – alex_reader Mar 11 '15 at 07:54