1

I'm trying to use Spirit.Qi to parse a simple file format that has key value pairs separated with an equals sign. The file also supports comments and blank lines, as well as quoted values.

I can get nearly all of this to work as expected, however, any blank lines or comments cause an empty key value pair to be added to the map. When the map is traded for a vector, no blank entries are produced.

Example Program:

#include <fstream> 
#include <iostream> 
#include <string> 
#include <map> 

#include "boost/spirit/include/qi.hpp" 
#include "boost/spirit/include/karma.hpp" 
#include "boost/fusion/include/std_pair.hpp" 

using namespace boost::spirit; 
using namespace boost::spirit::qi; 

//////////////////////////////////////////////////////////////////////////////// 
int main(int argc, char** argv) 
{ 
   std::ifstream ifs("file"); 
   ifs >> std::noskipws; 

   std::map< std::string, std::string > vars; 

   auto value = as_string[*print]; 
   auto quoted_value = as_string[lexeme['"' >> *(print-'"') >> '"']]; 
   auto key = as_string[alpha >> *(alnum | char_('_'))]; 
   auto kvp = key >> '=' >> (quoted_value | value); 

   phrase_parse( 
      istream_iterator(ifs), 
      istream_iterator(), 
      -kvp % eol, 
      ('#' >> *(char_-eol)) | blank, 
      vars); 

   std::cout << "vars[" << vars.size() << "]:" << std::endl; 
   std::cout << karma::format(*(karma::string << " -> " << karma::string << karma::eol), vars); 

   return 0; 
}

Input File:

one=two
three=four

# Comment
five=six

Output:

vars[4]:
 ->
one -> two
three -> four
five -> six

Where is the empty key value pair coming from? And how can I prevent it from being generated?

sehe
  • 374,641
  • 47
  • 450
  • 633
Aaron Wright
  • 328
  • 2
  • 11
  • Just in case you like it, I have some inspirational answers around parsing INI files: [Parsing INI files with source line/column info](http://stackoverflow.com/a/8365427/85371); another [similar question that uses `//` style comments](http://stackoverflow.com/a/18081512/85371); Or you might consider [using Boost Property Tree INI file support](http://stackoverflow.com/a/6579183/85371) – sehe Apr 08 '15 at 09:44

1 Answers1

1

Firstly, your program has undefined behaviour (and indeed it crashes on my system). The reason is you can't use auto expressions to store stateful parser expressions.

See Assigning parsers to auto variables, boost spirit V2 qi bug associated with optimization level and others. See e.g. these answers for useful strategies to get around this limitation.

Secondly, the empty line is because of the grammar.

There's a difference between

  (-kvp) % qi::eol

or

  -(kvp % qi::eol)

The first will result in "optionally parsing a kvp" followed by "push the result into the attribute container".

The latter will optionally "parse 1 or more kvp into a container". Note that this won't push the empty value if it wasn't matched.

Fixed/demo

I suggest

  • making key and value lexemes as well (just by dropping the Skipper on the rule declarations, really); You probably didn't want 'key name 1=value 1 to parse as "keyname1" -> "value1". You probably didn't want to allow key # no value\n either.
  • using BOOST_SPIRIT_DEBUG to see what's going on
  • not blanket using namespace boost::spirit. It's a bad idea. Trust me :/
  • rule declarations may appear to be verbose, but they do reduce the cruft in the rule definitions
  • using +eol instead of eol allows for the empty lines, which appears to be what you want

Live On Coliru

#define BOOST_SPIRIT_DEBUG
#include "boost/spirit/include/qi.hpp" 
#include "boost/spirit/include/karma.hpp" 
#include "boost/fusion/include/std_pair.hpp" 
#include <fstream> 
#include <map> 

namespace qi    = boost::spirit::qi;
namespace karma = boost::spirit::karma;

template <typename It, typename Skipper, typename Data>
struct kvp_grammar : qi::grammar<It, Data(), Skipper> {
    kvp_grammar() : kvp_grammar::base_type(start) {
        using namespace qi;

        value        = raw [*print];
        quoted_value = '"' >> *~char_('"') >> '"';
        key          = raw [ alpha >> *(alnum | '_') ];

        kvp          = key >> '=' >> (quoted_value | value);
        start        = -(kvp % +eol);

        BOOST_SPIRIT_DEBUG_NODES((value)(quoted_value)(key)(kvp))
    }
  private:
    using Pair = std::pair<std::string, std::string>;
    qi::rule<It, std::string(), Skipper> value;
    qi::rule<It, Pair(),        Skipper> kvp;
    qi::rule<It, Data(),        Skipper> start;
    // lexeme:
    qi::rule<It, std::string()> quoted_value, key;
};

template <typename Map>
bool parse_vars(std::istream& is, Map& data) {
    using It = boost::spirit::istream_iterator;
    using Skipper = qi::rule<It>;

    kvp_grammar<It, Skipper, Map> grammar;
    It f(is >> std::noskipws), l;

    Skipper skipper = ('#' >> *(qi::char_-qi::eol)) | qi::blank;
    return qi::phrase_parse(f, l, grammar, skipper, data); 
}

int main() { 
    std::ifstream ifs("input.txt"); 

    std::map<std::string, std::string> vars; 

    if (parse_vars(ifs, vars)) {
        std::cout << "vars[" << vars.size() << "]:" << std::endl; 
        std::cout << karma::format(*(karma::string << " -> " << karma::string << karma::eol), vars); 
    }
}

Output (currently broken on Coliru):

vars[3]:
five -> six
one -> two
three -> four

With debug info:

<kvp>
  <try>one=two\nthree=four\n\n</try>
  <key>
    <try>one=two\nthree=four\n\n</try>
    <success>=two\nthree=four\n\n# C</success>
    <attributes>[[o, n, e]]</attributes>
  </key>
  <quoted_value>
    <try>two\nthree=four\n\n# Co</try>
    <fail/>
  </quoted_value>
  <value>
    <try>two\nthree=four\n\n# Co</try>
    <success>\nthree=four\n\n# Comme</success>
    <attributes>[[t, w, o]]</attributes>
  </value>
  <success>\nthree=four\n\n# Comme</success>
  <attributes>[[[o, n, e], [t, w, o]]]</attributes>
</kvp>
<kvp>
  <try>three=four\n\n# Commen</try>
  <key>
    <try>three=four\n\n# Commen</try>
    <success>=four\n\n# Comment\nfiv</success>
    <attributes>[[t, h, r, e, e]]</attributes>
  </key>
  <quoted_value>
    <try>four\n\n# Comment\nfive</try>
    <fail/>
  </quoted_value>
  <value>
    <try>four\n\n# Comment\nfive</try>
    <success>\n\n# Comment\nfive=six</success>
    <attributes>[[f, o, u, r]]</attributes>
  </value>
  <success>\n\n# Comment\nfive=six</success>
  <attributes>[[[t, h, r, e, e], [f, o, u, r]]]</attributes>
</kvp>
<kvp>
  <try>five=six\n</try>
  <key>
    <try>five=six\n</try>
    <success>=six\n</success>
    <attributes>[[f, i, v, e]]</attributes>
  </key>
  <quoted_value>
    <try>six\n</try>
    <fail/>
  </quoted_value>
  <value>
    <try>six\n</try>
    <success>\n</success>
    <attributes>[[s, i, x]]</attributes>
  </value>
  <success>\n</success>
  <attributes>[[[f, i, v, e], [s, i, x]]]</attributes>
</kvp>
<kvp>
  <try></try>
  <key>
    <try></try>
    <fail/>
  </key>
  <fail/>
</kvp>
Community
  • 1
  • 1
sehe
  • 374,641
  • 47
  • 450
  • 633
  • Added a demo **[Live On Coliru](http://coliru.stacked-crooked.com/a/e90b78d9b9581029)** – sehe Apr 08 '15 at 09:30
  • I knew not to use `auto` with Spirit, but totally forgot. Thanks. You were right on the money about why I kept getting an empty pair in the map. I also integrated your other comments. I guess my sample program was trying to be too simple. Thanks again. – Aaron Wright Apr 08 '15 at 21:52