1

I want to parse a string of numbers into a vector of elements. The string consists of blocks of four numbers, separated by ( ) : /, and each block is separated by a ;.

Specifically, the string is in this format: int(int):float/float;, see code sample below. I think I could use a regular expression, but since the data is so structured, I'm sure there must be a more approachable and easier way to parse such a string. I'm using istringstream, but it feels a bit clumsy.

std::string line = "0(0):0/0;1(2):0.01/0.02;2(4):0.02/0.04;3(6):0.03/0.06;"
struct Element {
  int a;
  int b;
  int c;
  int d;
};

std::vector<Element> = parse(line);


std::vector<Element> parse(std::string line)
{
  std::vector<Element> elements;
  std::istringstream iss(line);
  while(iss) {
    char dummy;
    Element element;

    iss >> element.a;
    iss.read(&dummy,sizeof(dummy)); // (
    iss >> element.b;
    iss.read(&dummy,sizeof(dummy)); // )
    iss.read(&dummy,sizeof(dummy)); // :
    iss >> element.c;
    iss.read(&dummy,sizeof(dummy)); // /
    iss >> element.d;
    iss.read(&dummy,sizeof(dummy)); // ;

    if (!iss) {break;}

    elements.push_back(element);
  }
  return elements;
}

My questions:

  1. What would be a good way to parse? Should I use std::stringstream and read in number by number and 'chop off' the characters in between? As done in the code sample?
  2. This code has a bug and attempts to read one extra set of values, because while(iss) is still true, after the last character has been read in. How to terminate this loop without testing after each iss>>? Or more generally, how to loop over extractions from istringstream?
Jan Müller
  • 413
  • 3
  • 18
  • 2
    `1*sizeof(char)` is guaranteed to be 1. In the expression `1*sizeof(char)`, the value `1` is a magic number that seems to refer to the number of chars in `dummy` and `sizeof(char)` seems to refer to the type of `dummy`. If you opt out of supplying the constant 1 for expressiveness or maintainability, why not simply use `sizeof(dummy)`? The current form is not any better than simply supplying 1 directly. – François Andrieux May 03 '17 at 18:19
  • Is it important to validate the format of the string, or is it known to be correct? If it's known to be valid you can simply tokenize the whole string on any sequence of non-numeric character. You are left with a list of numbers from which it's easy to construct `Element`. [How do I tokenize a string in C++?](http://stackoverflow.com/questions/53849/how-do-i-tokenize-a-string-in-c) – François Andrieux May 03 '17 at 18:23
  • Have you ever thought about a serializer? Reading data structures is a common problem with typically a common solution. – Klaus May 04 '17 at 07:23
  • Thanks for the comments about `1*sizeof(char)`. Also made the `parse` function return `std::vector`. I'll look into boost/tokenizer, looks to do what I was looking for. – Jan Müller May 04 '17 at 07:34
  • @Klaus, can you recommend a specific serializer? Preferably, the serialized data should be text and not binary, i.e. human readable. – Jan Müller May 04 '17 at 14:20
  • @JanMüller: boost::serialize can write text files. I don't know how the configuration of own file formats work with boost. But the docs and examples are well. – Klaus May 04 '17 at 17:45

1 Answers1

1

Your data are well structured, you can easily overload operator>> to extract the class members from an std::ifstream and then keep reading them from an istringstream or a file stream.

Here is a possible implementation:

#include <iostream>
#include <vector>
#include <string>
#include <sstream>
#include <fstream>
#include <iterator>
#include <stdexcept>

class Element
{
public:
    Element() {}
    Element(int aa, int bb, float cc, float dd) : a{aa}, b{bb}, c{cc}, d{dd} {}

    friend std::istream &operator>> (std::istream &in, Element &e);
    friend std::ostream &operator<< (std::ostream &out, Element const &e);

private:
    int a;
    int b;
    float c;
    float d;
};

std::istream &operator>> (std::istream &in, Element &e)
{
    char delimiter;
    if ( not ( in >> e.a >> delimiter  and  delimiter == '(' and
               in >> e.b >> delimiter  and  delimiter == ')' and
               in >> delimiter         and  delimiter == ':' and
               in >> e.c >> delimiter  and  delimiter == '/' and
               in >> e.d >> delimiter  and  delimiter == ';' )
         and not in.eof() )
    {
        in.setstate(std::ios_base::failbit);
    }
    return in;

}

std::ostream &operator<< (std::ostream &out, Element const &e)
{
    return out << e.a << '(' << e.b << "):" << e.c << '/' << e.d << ';';
}

std::vector<Element> read_Elements_from(std::istream &in)
{
    std::vector<Element> tmp (
        std::istream_iterator<Element>{in},
        std::istream_iterator<Element>{}
    );
    if ( not in.eof() )
        throw std::runtime_error("Wrong format");

    return tmp;
}

int main()
{
  try
  {
    using std::cout;
    std::istringstream iss {
        "0(0):0/0;1(2):0.01/0.2;2(4):0.02/0.04;3(6):0.03/0.06;"
    };

    auto els_s = read_Elements_from(iss);

    cout << "Elements read from the string:\n";
    for ( auto const &i : els_s )
    {
        cout << i << '\n';
    }

    // assuming a file which lines are similar to the string provided
    std::ifstream input_file {"input_data.txt"};
    if ( not input_file )
        throw std::runtime_error("Can't open input file");

    auto els_f = read_Elements_from(input_file);

    cout << "\nElements read from the file:\n";
    for ( auto const &i : els_f )
    {
        cout << i << '\n';
    }
  }
  catch ( std::exception const &e )
  {
      std::cerr << "\nAn unexpected problem cause this application to end:\n\n"
                << e.what() << ".\n\n";
      return EXIT_FAILURE;
  }
}
Bob__
  • 12,361
  • 3
  • 28
  • 42