1

So I've seen lots of solutions on this site and tutorials about reading in from a text file in C++, but have yet to figure out a solution to my problem. I'm new at C++ so I think I'm having trouble piecing together some of the documentation to make sense of it all.

What I am trying to do is read a text file numbers while ignoring comments in the file that are denoted by "#". So an example file would look like:

#here is my comment
20 30 40 50
#this is my last comment
60 70 80 90

My code can read numbers fine when there aren't any comments, but I don't understand parsing the stream well enough to ignore the comments. Its kind of a hack solution right now.

/////////////////////// Read the file ///////////////////////
std::string line;
if (input_file.is_open())
{
    //While we can still read the file
    while (std::getline(input_file, line))
    {
        std::istringstream iss(line);
        float num; // The number in the line

        //while the iss is a number 
        while ((iss >> num))
        {
            //look at the number
        }
    }
}

else
{
    std::cout << "Unable to open file";
}
/////////////////////// done reading file /////////////////

Is there a way I can incorporate comment handling with this solution or do I need a different approach? Any advice would be great, thanks.

Ninja_Panda
  • 25
  • 1
  • 1
  • 3
  • 1
    `line.assign(line.substr(0,line.find('#')));` (as the first statement in the while-loop) would be one way of making the necessary change quickly. – jogojapan Nov 09 '12 at 07:44
  • This is very, very simple. You say you don't understand the code above well enough to modify it. I think you need to spend some time getting that understanding before you try anything else. – john Nov 09 '12 at 08:03
  • Have you tried it with comments present in the file? As written, the code will ignore any portion of a line after the first portion that is not a valid number, which includes the comments. – Bart van Ingen Schenau Nov 09 '12 at 08:53
  • 1
    Okay so I think @BartvanIngenSchenau is right, which was my intuition at first, but I'm getting some weird behavior that now I think is unrelated to the parsing. What I'm not showing here is that I'm using the file input to draw a bunch of geometry, and sometimes I get a red line drawn across the screen. So my thought was that it may be because it is doing something strange and reading the comments, but now I think its something else. So I'm going to explore some of the other elements, thanks everyone. – Ninja_Panda Nov 09 '12 at 22:11

3 Answers3

5

If your file contains # always in the first column, then just test, if the line starts with # like this:

while (std::getline(input_file, line))
{
    if (line[0] != "#" )
    {
        std::istringstream iss(line);
        float num; // The number in the line

        //while the iss is a number 
        while ((iss >> num))
        {
            //look at the number
        }
    }
}

It is wise though to trim the line of leading and trailing whitespaces, like shown here for example: Remove spaces from std::string in C++

Community
  • 1
  • 1
Chris
  • 1,613
  • 1
  • 18
  • 27
  • 1
    If it's not the first character, it's a simple one-liner to use `std::find` to find it, and `std::string::erase` to remove it and everything following. – James Kanze Nov 09 '12 at 08:44
  • See I tried an if statement like that before, and I get the error: `comparison between pointer and integer ('int' and 'const char*')`. – Ninja_Panda Nov 09 '12 at 22:05
  • If you replace `getline(input_file, line)` with `getline(input_file >> std::ws, line)`, your comment lines can contain leading whitespace. – Micha Wiedenmann Nov 12 '12 at 07:33
3

If this is just a one of use, for line oriented input like yours, the simplest solution is just to strip the comment from the line you just read:

line.erase( std::find( line.begin(), line.end(), '#' ), line.end() );

A more generic solution would be to use a filtering streambuf, something like:

class FilterCommentsStreambuf : public std::streambuf
{
    std::istream& myOwner;
    std::streambuf* mySource;
    char myCommentChar;
    char myBuffer;

protected:
    int underflow()
    {
        int const eof = std::traits_type::eof();
        int results = mySource->sbumpc();
        if ( results == myCommentChar ) {
            while ( results != eof && results != '\n') {
                results = mySource->sbumpc(0;
            }
        }
        if ( results != eof ) {
            myBuffer = results;
            setg( &myBuffer, &myBuffer, &myBuffer + 1 );
        }
        return results;
    }

public:
    FilterCommentsStreambuf( std::istream& source,
                             char comment = '#' )
        : myOwner( source )
        , mySource( source.rdbuf() )
        , myCommentChar( comment )
    {
        myOwner.rdbuf( this );
    }
    ~FilterCommentsStreambuf()
    {
        myOwner.rdbuf( mySource );
    }
};

In this case, you could even forgo getline:

FilterCommentsStreambuf filter( input_file );
double num;
while ( input_file >> num || !input_file.eof() ) {
    if ( ! input_file ) {
        //  Formatting error, output error message, clear the
        //  error, and resynchronize the input---probably by
        //  ignore'ing until end of line.
    } else {
        //  Do something with the number...
    }
}

(In such cases, I've found it useful to also track the line number in the FilterCommentsStreambuf. That way you have it for error messages.)

James Kanze
  • 150,581
  • 18
  • 184
  • 329
1

An alternative to the "read aline and parse it as a string", can be use the stream itself as the incoming buffer:

while(input_file)
{
    int n = 0;

    char c; 
    input_file >> c; // will skip spaces ad read the first non-blank

    if(c == '#')
    {
        while(c!='\n' && input_file) input_file.get(c);
        continue; //may be not soooo beautiful, but does not introduce useless dynamic memory
    }

    //c is part of something else but comment, so give it back to parse it as number
    input_file.unget(); //< this is what all the fuss is about!
    if(input_file >> n)
    { 
        // look at the nunber
        continue;
    }

    // something else, but not an integer is there ....
    // if you cannot recover the lopop will exit 
}
Emilio Garavaglia
  • 20,229
  • 2
  • 46
  • 63
  • Now there's a good example of how to write unreadable code. Not to mention that the last `if` is incorrect. (If you get that far, it will always be true, _unless_ there's been a hardware error.) – James Kanze Nov 09 '12 at 08:42
  • @JamesKanze: According to the standard the "failbit" (not "badbit", tht's different) is set when the extraction operation fails (for example because you expect to read a number but the e is a non-numeric digit at the input). The point here, is not to "close the code" and let it open to analyze furher additional cases. I restyled the code, but the point, here, is not to make it elegant, but to avoid to introduce some not necessary dynamic memory alloc/dealloc (typical with string-s and sringstream-s). – Emilio Garavaglia Nov 09 '12 at 12:52
  • In the original code, you didn't test `failbit` until after input had failed (`input_file >> n` evaluated false). And if input fails, either `failbit` or `badbit` must be set; `badbit` is set if and only if there is an exception from the streambuf (which is in most cases never). So when you tested `failbit`, it was almost certainly set. Once there has been failure, you _can_ test `eof()`, to decide whether it was because there was nothing more to read, or because there was an error in the input format (both of which cause `failbit` to be set). – James Kanze Nov 09 '12 at 12:59
  • Using `std::string` will typically not cause overhead which is measurable compared to the overhead of reading from a file. – James Kanze Nov 09 '12 at 13:00
  • And finally, a loop which is twenty lines long, with `continue` all over the place, is totally unreadable. (I can't actually think of any context where `continue` would result in readable code.) – James Kanze Nov 09 '12 at 13:01
  • @JamesKanze: yes, that's why I removed the "if", having no other value, since I cannot image what other possible handling suggest in case of a non-int input (may be skip the line or just up to the first blank... It depends what "semantics" is hidden behind those numbers). In that I agree. I was just pointing that it could not be just an "hardware failure" as per your first comment. As you better described in your reply. Thanks for pointing it out! – Emilio Garavaglia Nov 09 '12 at 13:05
  • @JamesKanze: continue is like break that is like return that is like goto. As per my experience they are always unreadable to anyone that doesn't want to read them, and perfectly readable to anyone who knows them. A little less religious approach may help you in understating also code that use a style that's not the one you like most. Continue is a perfectly legitimate mechanism to avoid deep nesting and to avoid to introduce fake states. There are cases where this adds value. – Emilio Garavaglia Nov 09 '12 at 13:10
  • @JamesKanze: a file is not necessarily always bound to a "disk file". let the OP to decide the trade-off. – Emilio Garavaglia Nov 09 '12 at 13:12
  • If you feel the need for `continue`, your loops and your function is too complex. it took me a while to understand how your code worked, and even longer to figure out that it was incorrect. `continue` is good for obfuscation, but nothing else. – James Kanze Nov 09 '12 at 14:15
  • And if you're worried about dynamic allocation in `string`, etc., see my second solution. There's not an `std::string` in site. And it's very readable _if_ you are familiar with the way `streambuf` works. (It's not otherwise, and I would consider such knowledge reasonably advanced C++; I don't think it's among the first things a programmer should learn.) – James Kanze Nov 09 '12 at 14:17
  • @JamesKanze: +1. questioning about `continue` is like questioning about the place you put the braces. Just religion. Anyone has its own. The use of a specific streambuf is interesting, but I don't find it so "readable". Not because of "coding style", but because it goes into a normally hidden side of stream i/o. (How many knows why `underflow` is used here? and `sbumpc`?) But for very general purpose code that's definitively the way to go (you can even chain different *preprocessors*, so it can be very flexible) – Emilio Garavaglia Nov 09 '12 at 14:54
  • The use of a custom streambuf _is_ readable _if_ you know the streambuf protocol. Knowledge of the streambuf protocol is not basic C++, however, and there are many very good C++ programmers who are not familiar with it. (On the other hand, it is well worth learning, as it opens the door to several useful patterns.) – James Kanze Nov 12 '12 at 09:13