C++ parse input

Question

I need to parse a C++ stdin input which looks something like this:

N M (pairs)

0 0
2 1 (0,1)
2 0
5 8 (0,1) (1,3) (2,3) (0,2) (0,1) (2,3) (2,4) (2,4)

If N > 0 && M > 0, then M pairs will follow. It is a single line input so I have no idea how to do it.

I have some solution, but something tells me it's not the best one.

void input(){
    int a[100][2];
    int n,m;
    char ch;
    cin >> n >> m;
    for ( int i = 0; i < m; i++) {
        cin >> ch >> a[i][0]>> ch>> a[i][1]>>ch;    
    }

    cout << n << " " << m << " \n";

    for ( int i=0; i < m; i++ ) {
        cout << "(" << a[i][0] << " ," << a[i][1] << ")";   
    }
}

My question is what is the best / more correct way to do this?

I assume `m` means the number of pairs that need to follow it. Is `n` somehow meaningful in a similar way? — jrok, Jul 18 '12 at 12:36
m represents the number of pairs yes - n is not meaningful in any way for the current problem. — Ext, Jul 18 '12 at 12:37
you will find a helpful code snippets and discussion which will help you in http://stackoverflow.com/questions/11535854/why-is-the-sort-not-working. This is not a direct answer to your problem, but it gives enough code pieces which you could utilize — PermanentGuest, Jul 18 '12 at 12:41

Filip Roséen - refp · Accepted Answer · 2012-07-18T13:54:11.280

Since input data to applications never can be trusted there is an importance of adding error checks to see that the data provided is indeed valid (otherwise the result of the application might suffer from errors while parsing).

The "C++ way" of handling errors such as this is to throw an exception when a problem arise in the functions responsible for parsing data.

The caller of this function will then wrap the call in a try-catch-block to catch errors that might appear.

With a user-defined-type..

Defining your own type for holding your pairs of data will greatly improve the readability of your code, the output from the below implementation and the one found later in this post is the same.

#include <iostream>
#include <string>
#include <sstream>
#include <stdexcept>

struct Pair {
  Pair (int a, int b)
    : value1 (a), value2 (b)
  {}

  static Pair read_from (std::istream& s) {
    int value1, value2;

    if ((s >> std::ws).peek () != '(' || !s.ignore () || !(s >> value1))
      throw std::runtime_error ("unexpected tokens; expected -> (, <value1>");

    if ((s >> std::ws).peek () != ',' || !s.ignore () || !(s >> value2))
      throw std::runtime_error ("unexpected tokens; expected -> , <value2>");

    if ((s >> std::ws).peek () != ')' || !s.ignore ())
      throw std::runtime_error ("unexpected token;expected -> )");

    return Pair (value1,value2);
  }

  int value1, value2;
};

^{The one thing I've noticed that might be hard for programmers to grasp about the above is the use of s >> std::ws; it's used to consume available white-spaces so that we can use .peek to get the next non-whitespace character available.}

^{The reason I implemented a static function read_from instead of ostream& operator>>(ostream&, Pair&) is that the later will require that we create an object before even reading from the stream, which in some cases are undesirable.}

void
parse_data () {
  std::string line;

  while (std::getline (std::cin, line)) {
    std::istringstream iss (line);
    int N, M;

    if (!(iss >> N >> M))
      throw "unable to read N or M";
    else
      std::cerr << "N = " << N << ", M = " << M << "\n";

    for (int i =0; i < M; ++i) {
      Pair data = Pair::read_from (iss);

      std::cerr << "\tvalue1 = " << data.value1 << ", ";
      std::cerr << "\tvalue2 = " << data.value2 << "\n";
    }
  }
}

^{Normally I wouldn't recommend naming non-const variables in only uppercase, but to make it more clear which variable contains what I use the same name as your description of the input.}

int
main (int argc, char *argv[])
{
  try {
    parse_data ();

  } catch (std::exception& e) {
    std::cerr << e.what () << "\n";
  }
}

Without the use of user-defined-types

The straight forward method of parsing the data as well as having checks for errors is to use something as the following, though it could be greatly improved by using User Defined Objects and operator overloads.

read each line using std::getline

construct n std::istringstream iss (line) with the line read

try to read two ints using iss >> N >> M

read M "words" using a std::string s1* with iss >> s1;

Construct a std::istringstream inner_iss using the s1 as initializer

peek to see that the next char available is ( && ignore this char

read integer

peek to see that the next char available is , && ignore this char

read integer

peek to see that the next char available is ) && ignore this char

If the stringstream isn't empty after step 4 or iss.good () returns false anywhere inbetween the steps the is a syntax error in the data read.

Sample implementation

The source can be found by following the link below (code put elsewhere to save space):

ideone.com - example snippet without using user-defined-types

N = 0, M = 0
N = 2, M = 1
     value1 = 0, value2 = 1
N = 2, M = 0
N = 5, M = 8
     value1 = 0, value2 = 1
     value1 = 1, value2 = 3
     value1 = 2, value2 = 3
     value1 = 0, value2 = 2
     value1 = 0, value2 = 1
     value1 = 2, value2 = 3
     value1 = 2, value2 = 4
     value1 = 2, value2 = 4

@Ext using your method there is no good way to make sure that the input is indeed in a valid format, you could for example have `"5 2 (0\n,1)\n\n(\n2,3)"` without your code complaining. — Filip Roséen - refp, Jul 18 '12 at 12:56
Actually you have trown to much effort in making error checks. But that is MY error! data will always come as presented (INT,INT) etc. But you did give me some pretty good ideas on how i must handle the input and the s >> std::ws is something new for me — Ext, Jul 18 '12 at 14:33
@Ext you can never be too sure about the input data, but I'm glad you found some parts of the answer contributing! — Filip Roséen - refp, Jul 18 '12 at 14:36
Why not to use a constructor? This make no need to repeat the members list nor call other ctor. — Isaac Pascual, Mar 21 '19 at 10:16

score 1 · Answer 2 · answered Jul 18 '12 at 12:40

If the requirement is that the data for an operation is all on a single line, then the best technique is probably to read the line into a string, and then parse a stringstream initialized from the input string.

You should think about whether you need to verify that the parentheses and commas are really parentheses and commas — would you generate an error if the input was:

23 2 @3;8= %      7      %     12     %

Your code would accept that as valid at the moment.

score 1 · Answer 3 · answered Jul 18 '12 at 12:44

The canonical solution for something like this is to define a type for the pairs, and implement a >> operator for it. Something like:

class Pair
{
    int first;
    int second;
public:
    Pair( int first, int second );
    //  ...
};

std::istream&
operator>>( std::istream& source, Pair& object )
{
    char open;
    char separ;
    char close;
    int first;
    int second;
    if ( source >> open >> first >> separ >> second >> close
            && open == '(' && separ == ',' && close == ')' ) {
        object = Pair( first, second );
    } else {
        source.setstate( std::ios_base::failbit );
    }
    return source;
}

Given that, to read the file:

std::string line;
while ( std::getline( source, line ) ) {
    std::istringstream l( line );
    int n;
    int m;
    std::vector<Pair> pairs;
    l >> n >> m;
    if ( !l ) {
        //  Syntax error...
    }
    Pair p;
    while ( l >> p ) {
        pairs.push_back( p );
    }
    if ( ! l.eof() ) {
        //  Error encountered somewhere...
    }
    //  Other consistency checks...
}

score 1 · Answer 4 · answered Jul 18 '12 at 13:01

I prefer Boost.Spirit for such tasks:

#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/adapted/struct/adapt_struct.hpp>

#include <boost/fusion/include/std_pair.hpp>

#include <string>
#include <iostream>

struct input {
  int x, y;
  typedef std::pair<int, int> pair;
  std::vector< pair > pairs;
};

BOOST_FUSION_ADAPT_STRUCT(
  input,
  (int, x)
  (int, y)
  (std::vector< input::pair >, pairs))

namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;

template<typename Iterator>
struct input_parser : qi::grammar<Iterator, input(), ascii::space_type> {
  input_parser() : input_parser::base_type(start) {
    // two integers followed by a possibly empty list of pairs
    start = qi::int_ >> qi::int_ >> *pair;
    // a tuple delimited by braces and values separated by comma
    pair = '(' >> qi::int_ >> ',' >> qi::int_ >> ')';
  }

  qi::rule<Iterator, input(), ascii::space_type> start;
  qi::rule<Iterator, input::pair(), ascii::space_type> pair;
};

template<typename Iterator>
void parse_and_print(Iterator begin, Iterator end) {
    input x;
    input_parser<Iterator> p;
    bool r = qi::phrase_parse(begin, end, p, ascii::space, x);
    if(!r) {
      std::cerr << "Error parsing" << std::endl;
      return;
    }

    std::cout << "Output" << std::endl;
    std::cout << "x: " << x.x << std::endl;
    std::cout << "y: " << x.y << std::endl;
    if(x.pairs.empty()) {
      std::cout << "No pairs.";
    } else {
      for(std::vector<input::pair>::iterator it = x.pairs.begin(); 
          it != x.pairs.end(); ++it) { 
        std::cout << "(" << it->first << ',' << it->second << ") ";
      }
    }
    std::cout << std::endl;
}


int main()
{
    namespace qi = boost::spirit::qi;

    std::string input1 = "0 0";
    std::string input2 = "2 1 (0,1)";
    std::string input3 = "2 0";
    std::string input4 = "5 8 (0,1) (1,3) (2,3) (0,2) (0,1) (2,3) (2,4) (2,4)";
    parse_and_print(input1.begin(), input1.end());
    parse_and_print(input2.begin(), input2.end());
    parse_and_print(input3.begin(), input3.end());
    parse_and_print(input4.begin(), input4.end());
    return 0;
}

score -1 · Answer 5 · edited May 23 '17 at 11:44

-1

Since you'd have noticed the pattern in your input, anything like string tokenizer would solve your problem.

For that you could use strtok function. Also for Boost library implementation is useful and exemplified well here

edited May 23 '17 at 11:44

Community

1
1

answered Jul 26 '12 at 07:50

zeropoint

235
1
4
10

C++ parse input

5 Answers5

With a user-defined-type..

Without the use of user-defined-types

Sample implementation