1

i want to parse a CSV-like file, line with boost. There are many different methods like split, tokenise, spirit, regex...

A parsing line could look like: "abc" "def" "hij \"hgfd\" " and the result should look like:

"abc"
"def"
"hij \"hgfd\" "

I thought that using boost's tokenises with the escaped_list_separator would be a great idea but it is not possible to split on whitespace delimiter, isnt it ?

Mankarse
  • 39,818
  • 11
  • 97
  • 141
Roby
  • 2,011
  • 4
  • 28
  • 55
  • 2
    CSV (and similar file formats) are *deceptively* easy to parse. I say deceptively, because there are many corner-cases that will cause problems, like you have noticed. To solve your problem you need a stateful parser, you need to keep a state telling you what kind of token you are parsing. For example, if the state says you're in a string you should read spaces and add them to the string instead of treating them as field separators. – Some programmer dude Jun 10 '15 at 08:58

1 Answers1

1

Here's a quick and dirty to match just what you described using Spirit (multiple lines into a vector>):

Live On Coliru

#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/qi_match.hpp>
namespace qi = boost::spirit::qi;

int main() {
    std::vector<std::vector<std::string>> csv_data;

    if (std::cin 
            >> std::noskipws 
            >> qi::phrase_match(*qi::lexeme['"' >> *('\\' >> qi::char_ | ~qi::char_("\r\n\"")) >> '"'] % qi::eol, qi::blank, csv_data))
    {
        std::cout << "Parse succeeded: " << csv_data.size() << "\n";
        for(auto& row: csv_data) {
            for(auto& c: row) std::cout << c << '|';
            std::cout << "\n";
        }
    } else {
        std::cout << "Parse failed\n";
    }
}

The example printing:

Parse succeeded: 3
abc|def|hij "hgfd" |
qwehjr|aweqwejkl||

For a background on parsing (optionally) quoted delimited fields, including different quoting characters (', "), see here:

For a very, very, very complete example complete with support for partially quoted values and a

splitInto(input, output, ' ');

method that takes 'arbitrary' output containers and delimiter expressions, see here:

Community
  • 1
  • 1
sehe
  • 374,641
  • 47
  • 450
  • 633
  • wow that is really expressive !!! elegant !!! and flexible !!! amazing! could you write something about the techniques used in / for this example to learn ? im really new to THIS c++ and boost. Normally i mixes ansi c with a little bit of c++ ( old standard ) is there a book available ? – Roby Jun 10 '15 at 12:51
  • Thanks (I just noticed I had forgotten to copy the sample into the answer directly) – sehe Jun 10 '15 at 12:52
  • @Roby books: http://stackoverflow.com/questions/388242/the-definitive-c-book-guide-and-list - There's not really a boost book (but see http://theboostcpplibraries.com/) – sehe Jun 10 '15 at 14:51