Parsing fields from stdin in C++

Question

If I have the stdin input as follows:

2014-01-23,  AA, 20
2014-05-30,  BB,2    //notice that I might have optional space
2015-03-24, CC,   5
//...
//... and so on

How do I write a program in C++ that efficiently parse the month and year, and also subsequent field? I am really stuck by this parsing issue.

What I want to do with the subsequent field is stored AA, 20 as a map. So map[AA]=20 so on.

I can do this myself. But I can't figure out how to read and parse it. Please help.

Attempt:

int year, month;
int  num;
string key;
map<string, int> mapping;
string s;
getline(cin,s, '-'); 
year=stoi(s); 
getline(cin,s, '-');
month=stoi(s); 
getline(cin,s, ',');
//reading the AA, BB, CC field;
getline(cin,s, ',');
for (int i=0; i<s.size(); i++);
   if (s[i]==' ') s.erase(i,1);
key=s;
//now, reading the number field following AA,BB, CC
getline(cin,s,'\n');
for (int i=0; i<s.size(); i++);
   if (s[i]==' ') s.erase(i,1);
num=stoi(s);
mapping[key]=num;

You need to take this one step at a time. First, write a program that reads each line of text, one line at a time. Step two: parse each line of text into the individual fields. Step three: parse the first field into its component, year, month, and day. Problem solved. See how easy it was? — Sam Varshavchik, Nov 01 '16 at 02:26
There's an old Vulcan proverb: the longer the code, the likelier is that it has a bug. — Sam Varshavchik, Nov 01 '16 at 02:42

phuclv · Accepted Answer · 2020-10-02T00:01:03.130

1

Another option is to use std::regex (or Boost.Regex if you're on an "ancient" compiler)

Match the line with this

(\d{4})\-(\d{2})\-(\d{2}),\s*(.+),\s*(.+)

then get year, month, day, first field, second field from the match groups \1, \2, \3, \4, \5 respectively

edited Oct 02 '20 at 00:01

answered Nov 01 '16 at 02:42

phuclv

37,963
15
156
475

if I have a very large amount of data (many lines to read), do you think this method would still be efficient? – wrek Nov 01 '16 at 05:07
1

It depends. The only way to know is benchmarking it. A compiled regex can be reused and therefore have quite good performance and can be easily changed unlike a fixed parser – phuclv Nov 01 '16 at 05:29
This might be a dumb question. How do I use`\d{4})-(\d{2})-(\d{2}),\s*(.+),\s*(.+)` with regex? – wrek Nov 02 '16 at 02:48
[learn](http://www.regular-expressions.info/) about [tag:regex] first then use [`regex_search` or `regex_match`](http://en.cppreference.com/w/cpp/regex) like in the example – phuclv Nov 02 '16 at 03:00

score 0 · Answer 2 · answered Nov 01 '16 at 02:31

0

An answer to a similar problem was given here using std::basic_string::find. You can use -, , and , as delimiters.

answered Nov 01 '16 at 02:31

ZeroPad

11
2

score 0 · Answer 3 · answered Nov 01 '16 at 02:37

0

Try this:

#include <bits/stdc++.h>
using namespace std;

int main(){
    string s;
    char c;
    int x;
    cin >> s >> c >> x;
    s = s.substr(0,s.length() - 2);
    cout << s << " " << c << " " << x << endl;
    return 0;
}

answered Nov 01 '16 at 02:37

Genarito

3,027
5
27
53

[Why should I not #include ?](https://stackoverflow.com/q/31816095/995714) and [Why is “using namespace std;” considered bad practice?](https://stackoverflow.com/q/1452721/995714) – phuclv Oct 01 '20 at 02:53
Thank you for pointing it out. This answer was a few years ago and my knowledge was very limited – Genarito Oct 01 '20 at 12:41

Parsing fields from stdin in C++

3 Answers3