read multiple lines but especially... parsing them efficiently

Question

I need to read multiple lines with specific keywords at the beginning. I have a basic problem and I'd need a hand to help me.

Here are the kind of input:

keyword1 0.0 0.0
keyword1 1.0 5.0
keyword2 10.0
keyword3 0.5
keyword4 6.0

rules are:

lines containing keyword1 & keyword2 SHOULD be in that order AND before any other lines.
lines containing keyword3 & keyword4 can be in any order
keyword1 HAS TO be followed by 2 double
keyword2, 3 & 4 HAVE TO be followed by 1 double
at the end of a block of lines containing all the four keyword followed by their double, the "loop" breaks and a calculation is triggered.

Here's the source I have:

using namespace std;

int main (int argc, const char * argv[]) {    
    vector<double> arrayInputs;
    string line;
    double keyword1_first, keyword1_second, keyword4, 
          keyword3, keyword2;
    bool inside_keyword1=false, after_keyword2=false, 
          keyword4_defined=false, keyword3_defined=false ;

//cin.ignore();

 while (getline(cin, line)) {
     if (inside_keyword1 && after_keyword2 && keyword3 && keyword4) {
         break;
     }
     else
     {
         std::istringstream split(line);
         std::vector<std::string> tokens;
         char split_char = ' ';
         for (std::string each; std::getline(split, each, split_char); tokens.push_back(each));

         if (tokens.size() > 2)
         {
             if (tokens[0] != "keyword1") return EXIT_FAILURE; // input format error
             else
             {
                 keyword1_first = atof(tokens[1].c_str());
                 keyword1_second = atof(tokens[2].c_str());

                 inside_keyword1 = true;
             }
         }
         else
         {
             if (tokens[0] == "keyword2")
             {
                 if (inside_keyword1)
                 {
                     keyword2 = atof(tokens[1].c_str());
                     after_keyword2 = true;
                 }

                 else return EXIT_FAILURE; // cannot define anything else keyword2 after keyword1 definition

             }
             else if (tokens[0] == "keyword3")
             {
                 if (inside_keyword1 && after_keyword2)
                 {
                     keyword3 = atof(tokens[1].c_str());
                     keyword3_defined  = true;
                 }
                 else return EXIT_FAILURE; // cannot define keyword3 outside a keyword1
             }
             else if (tokens[0] == "keyword4")
             {
                 if (inside_keyword1 && after_keyword2)
                 {
                     keyword4 = atof(tokens[1].c_str());
                     keyword4_defined  = true;
                 }
                 else return EXIT_FAILURE; // cannot define keyword4 outside a keyword1
             }
         }
     }
 }

 // Calculation


 // output


 return EXIT_SUCCESS;
}

My question is: Is there a more efficient way to go about this besides using booleans in the reading/parsing loop ?

You define your "problem" (in as much as you've laid out a task or assignment), and you provide source, but to readers you you haven't pinpointed *a question* for us. What is your question? — HostileFork says dont trust SE, Oct 16 '11 at 12:31
I'm very sorry. The problem is: would it exist another way more efficient to do that instead of using booleans in the reading/parsing loop ? — gluon, Oct 16 '11 at 14:45

score 2 · Answer 1 · edited May 23 '17 at 12:17

You ask about something "more efficient", but it seems you don't have a particular performance objective. So what you want here is probably more like a Code Review. There's a site for that, in particular:

https://codereview.stackexchange.com/

But anyway...

You are correct to intuit that four booleans are not really called for here. That's 2^4 = 16 different "states", many of which you should never be able to get to. (Your specification explicitly forbids, for instance, keyword3_defined == true when after_keyword1 == false).

Program state can be held in enums and booleans, sure. That makes it possible for a "forgetful" loop to revisit a line of code under different circumstances, yet still remember what phase of processing it is in. It's useful in many cases, including in sophisticated parsers. But if your task is linear and simple, it's better to implicitly "know" the state based on having reached a certain line of code.

As an educational example to show the contrast I'm talking about, here's a silly state machine to read in a letter A followed by any number of letter Bs:

enum State {
    beforeReadingAnA,
    haveReadAnA,
    readingSomeBs,
    doneReadingSomeBs
};

State s = beforeReadingAnA;
char c;
while(true) {
    switch (s) {
        case beforeReadingAnA: 
            cin >> c;
            if (cin.good() && c == 'A') {
                // good!  accept and state transition to start reading Bs...
                s = haveReadAnA;
            } else {
                // ERROR: expected an A
                return EXIT_CODE_FAILURE;
            };
            break;

         case haveReadAnA:
            // We've read an A, so state transition into reading Bs
            s = readingSomeBs;
            break;

         case readingSomeBs:
            cin >> c;
            if (cin.good() && c == 'B') {
                // good!  stay in the readingSomeBs state
            } else if (cin.eof()) {
                // reached the end of the input after 0 or more Bs
                s = doneReadingSomeBs;
            } else {
                // ERROR: expected a B or the EOF
                return EXIT_CODE_FAILURE;
            }
            break;

         case doneReadingSomeBs:
             // all done! 
             return EXIT_CODE_SUCCESS;
    }
}

As mentioned, it's a style of coding that can be very, very useful. Yet for this case it's ridiculous. Compare with a simple linear piece of code that does the same thing:

// beforeReadingAnA is IMPLICIT

char c;
cin >> c;
if (cin.fail() || c != 'A')
   return EXIT_CODE_FAILURE;

// haveReadAnA is IMPLICIT

do {
    // readingSomeBs is IMPLICIT

    cin >> c;
    if (cin.eof())
       return EXIT_CODE_SUCCESS;
    if (cin.fail() || c != 'B')
       return EXIT_CODE_FAILURE;
}

// doneReadingSomeBs is IMPLICIT

All the state variables disappear. They are unnecessary because the program just "knows where it is". If you rethink your example then you can probably do the same. You won't need four booleans because you can put your cursor on a line of code and say with confidence what those four boolean values would have to be if that line of code happens to be running.

As far as efficiency goes, the <iostream> classes can make life easier than you have it here and be more idiomatically C++ without invoking C-isms like atof or ever having to use c_str(). Let's look at a simplified excerpt of your code that just reads the doubles associated with "keyword1".

string line;
getline(cin, line);
istringstream split(line);
vector<string> tokens;
char split_char = ' ';
string each;
while (getline(split, each, split_char)) {
    tokens.push_back(each);
}
double keyword1_first, keyword1_second;
if (tokens.size() > 2) {
    if (tokens[0] != "keyword1") {
        return EXIT_FAILURE; // input format error
    } else {
        keyword1_first = atof(tokens[1].c_str());
        keyword1_second = atof(tokens[2].c_str());
    }
}

Contrast that with this:

string keyword;
cin >> keyword;
if (keyword != "keyword1") {
    return EXIT_FAILURE;
}
double keyword1_first, keyword1_second;
cin >> keyword1_first >> keyword1_second;

Magic. Iostreams can detect the type you are trying to read or write. If it encounters a problem interpreting the input in the way you ask for, then it will leave the input in the buffer so you can try reading it another way. (In the case of asking for a string, the behavior is to read a series of characters up to whitespace...if you actually wanted an entire line, you would use getline as you had done.)

The error handling is something you'll have to deal with, however. It's possible to tell iostreams to use exception-handling methodology, so that the standard response to encountering a problem (such as a random word in a place where a double was expected) would be to crash your program. But the default is to set a failure flag that you need to test:

cin erratic behaviour

There's nuance to iostream, so you probably want to do some survey of Q&A...I've been learning a bit myself lately while answering/asking here:

Output error when input isn't a number. C++

When to use printf/scanf vs cout/cin?

I have to THANK YOU a lot, HostileFork. I'm totally sorry about the way I asked that etc. I'm learning how to use/share/discuss here... About the concepts, I thank you a lot. I'll try on tomorrow to implement enum & iostream in that piece of code. I'll let you know. Best Regards, J. — gluon, Oct 16 '11 at 22:39
Sure...happy to help people who are seeking to improve their code & methods. (Lots of folks just don't care enough to ask.) But soon it will be your chance to go explaining this stuff to someone else! :) Do bear in mind that part of my point w/the contrast of methods is that you might not need the enum for this particular program. Try to make your program "know where it is" and introduce states only if you need to. — HostileFork says dont trust SE, Oct 16 '11 at 22:46
all is implemented. not in THE BEST way but it should work. Now I have a seg fault 11.. tracking that. I don't know where I didn't allocate correctly memory. — gluon, Oct 17 '11 at 12:34
All is not implemented until it's provably correct!! :-P For inspiration, look at Don Knuth's philosophy on version numbering etc: http://en.wikipedia.org/wiki/Donald_Knuth#Humor — HostileFork says dont trust SE, Oct 18 '11 at 10:03

read multiple lines but especially... parsing them efficiently

1 Answers1