C++ - How to use a stream to parse a file?

Question

I have a file and I need to loop through it assigning an int foo, string type, 64/128 bit long. How would I use a stream to parse these lines into the following variables - I want to stick with the stream syntax ( ifs >> foo >> type ) but in this case type would end up being the rest of the line after the 0/52 ... and at that point I'd just get a char* and use strtoull and such so why use the stream in the first place... I'm hoping for readable code without horrid performance over char strings / strtok / strtoull

//input file:
0ULL'04001C0180000000000000000EE317BC'
52L'04001C0180000000'
//ouput:
//0 ULL 0x04001C0180000000 0x000000000EE317BC
//52 L 0x04001C0180000000

  ifstream ifs("input.data");
  int foo;
  string type;
  unsigned long long ull[2];

score 8 · Answer 1 · edited May 23 '17 at 12:08

Boost Spirit implementation

Here is the mandatory Boost Spirit (Qi) based implementation. For good measure, including formatting using Boost Spirit (Karma):

#include <string>
#include <iostream>
#include <fstream>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/karma.hpp>

namespace karma=boost::spirit::karma;
namespace qi   =boost::spirit::qi;

static qi::uint_parser<unsigned long long, 16, 16, 16> hex16_p; // parse long hex
static karma::uint_generator<unsigned long long, 16>   hex16_f; // format long hex

int main(int argc, char** args)
{
    std::ifstream ifs("input.data");
    std::string line;
    while (std::getline(ifs, line))
    {
        std::string::iterator begin = line.begin(), end = line.end();

        int                             f0;
        std::string                     f1;
        std::vector<unsigned long long> f2;

        bool ok = parse(begin, end,
                qi::int_                    // an integer
                >> *qi::alpha               // alternatively: *(qi::char_ - '\'')
                >> '\'' >> +hex16_p >> '\'' // accepts 'n x 16' hex digits
            , f0, f1, f2);

        if (ok)
            std::cout << "Parsed: " << karma::format(
                 karma::int_ 
                 << ' ' << karma::string 
                 << ' ' << ("0x" << hex16_f) % ' '
             , f0, f1, f2) << std::endl;
        else
            std::cerr << "Parse failed: " << line << std::endl;
    }

    return 0;
}

Test run:

Parsed: 0 ULL 0x4001c0180000000 0xee317bc
Parsed: 52 L 0x4001c0180000000

^{see Tweaks and samples below for info on how to tweak e.g. hex output}

Benchmark

I had benchmarked @Cubbi's version and the above as written on 100,000x the sample inputs you provided. This initially gave Cubbi's version a slight advantage: 0.786s versus 0.823s.

Now, that of course wasn't fair comparison, since my code is constructing the parser on the fly each time. With that taken out of the loop like so:

typedef std::string::iterator It;

const static qi::rule<It> parser = qi::int_ >> *qi::alpha >> '\'' >> +hex16_p >> '\'';
bool ok = parse(begin, end, parser, f0, f1, f2);

Boost Spirit comes out a clear winner with only 0.093s; already a factor 8.5x faster, and that is even with the karma formatter still being constructed each iteration.

^{with the output formatting commented out in both versions, Boost Spirit is >11x faster}

Tweaks, samples

Note how you can easily tweak things:

//  >> '\'' >> +hex16_p >> '\'' // accepts 'n x 16' hex digits
    >> '\'' >> qi::repeat(1,2)[ hex16_p ] >> '\'' // accept 16 or 32 digits

Or format the hex output just like the input:

// ("0x" << hex16_f) % ' '
karma::right_align(16, '0')[ karma::upper [ hex16_f ] ] % ""

Changed sample output:

0ULL'04001C0180000000000000000EE317BC'
Parsed: 0 ULL 04001C0180000000000000000EE317BC
52L'04001C0180000000'
Parsed: 52 L 04001C0180000000

HTH

Unfortunately I can't use boost, but +1 for the benchmark as it tells me Cubbi's performance is good enough for my case. And I may just have to play around with your code for fun anyway. — , Jun 02 '11 at 02:59
I realise this is a few months old, but that's a lovely Spirit example. +1 to your Karma. — icabod, Mar 08 '12 at 15:31
["With constexpr format string compilation fmt::format/fmt::compile is about as fast on integer formatting as Karma generate on their own benchmark and ~3.5x faster than printf. There is still room for improvement though"](https://twitter.com/vzverovich/status/1168545905504485380) — sehe, Sep 02 '19 at 16:53

score 4 · Accepted Answer · edited May 23 '17 at 11:53

This is a rather trivial task for a more sophisticated parser such as boost.spirit.

To solve this using just the standard C++ streams you would need to

a) treat ' as whitespace and
b) take an extra pass over the string "04001C0180000000000000000EE317BC" which has no separators between the values.

Borrowing Jerry Coffin's sample facet code,

#include <iostream>
#include <fstream>
#include <locale>
#include <vector>
#include <sstream>
#include <iomanip>
struct tick_is_space : std::ctype<char> {
    tick_is_space() : std::ctype<char>(get_table()) {}
    static std::ctype_base::mask const* get_table()
    {
        static std::vector<std::ctype_base::mask>
               rc(table_size, std::ctype_base::mask());
        rc['\n'] = std::ctype_base::space;
        rc['\''] = std::ctype_base::space;
        return &rc[0];
    }
};

int main()
{
    std::ifstream ifs("input.data");
    ifs.imbue(std::locale(std::locale(), new tick_is_space()));
    int foo;
    std::string type, ullstr;
    while( ifs >> foo >> type >> ullstr)
    {
        std::vector<unsigned long long> ull;
        while(ullstr.size() >= 16) // sizeof(unsigned long long)*2
        {
            std::istringstream is(ullstr.substr(0, 16));
            unsigned long long tmp;
            is >> std::hex >> tmp;
            ull.push_back(tmp);
            ullstr.erase(0, 16);
        }
        std::cout << std::dec << foo << " " << type << " "
                  << std::hex << std::showbase;
        for(size_t p=0; p<ull.size(); ++p)
            std::cout << std::setw(16) << std::setfill('0') << ull[p] << ' ';
        std::cout << '\n';
    }
}

test: https://ideone.com/lRBTq

As an exercise I wrote [the Spirit version](http://stackoverflow.com/questions/6206302/c-how-to-use-a-stream-to-parse-a-file/6208159#6208159); I also benchmarked the performance difference, which is quite interesting (I did _not_ benchmark compile times of course :^]) — sehe, Jun 01 '11 at 22:27
I do so much like this better than the strtok code from 15 years ago that I'm looking at. — , Jun 02 '11 at 03:00

C++ - How to use a stream to parse a file?

2 Answers2

Boost Spirit implementation

Benchmark

Tweaks, samples