0

I am writing a c++ function for reading the nth column of a tab delimited text file, here is what I have done:

typedef unsigned int  uint;


inline void fileExists (const std::string& name) {
    if ( access( name.c_str(), F_OK ) == -1 ) {
        throw std::string("File does not exist!");
    }
}

size_t bimNCols(std::string fn) {
    try {
        fileExists(fn);
        std::ifstream in_file(fn);
        std::string tmpline;
        std::getline(in_file, tmpline);
        std::vector<std::string> strs;
        strs = boost::split(strs, tmpline, boost::is_any_of("\t"), boost::token_compress_on);
        return strs.size();
    } catch (const std::string& e) {
        std::cerr << "\n" << e << "\n";
        exit(EXIT_FAILURE);
    }
}

typedef std::vector<std::string> vecStr;

vecStr bimReadCol(std::string fn, uint ncol_select) {
    try {
        size_t ncols = bimNCols(fn);
        if(ncol_select < 1 or ncol_select > ncols) {
            throw std::string("Your column selection is out of range!");
        }

        std::ifstream in_file(fn);
        std::string tmpword;
        vecStr colsel; // holds the column of strings
        while (in_file) {
            for(int i=1; i<ncol_select; i++) {
                in_file >> tmpword;
            }
            in_file >> tmpword;
            colsel.push_back(tmpword);
            in_file.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
        }
        return colsel;

    } catch (const std::string& e) {
        std::cerr << "\n" << e << "\n";
        exit(EXIT_FAILURE);
    }
}

The problem is, in the bimReadCol function, at the last line, after

in_file.ignore(std::numeric_limits<std::streamsize>::max(), '\n');

in_file.good() still evaluates to true. So, suppose I have a text file test.txt like this:

a 1 b 2
a 1 b 2
a 1 b 2

bimReadCol("test.txt", 3) would return a vector (b, b, b, b), with an extra element. Any idea how to fix this?

qed
  • 22,298
  • 21
  • 125
  • 196
  • Note: Pleas use return values and less exceptions - and if you use an exception, derive from std::exception –  Jul 24 '14 at 17:57
  • @DieterLücking could you please give a reference on return values vs exceptions? I have no idea how to use return value for the same purpose as exceptions. – qed Jul 24 '14 at 18:12
  • No, but in my opinion `fileExists` should certainly not throw an exception. –  Jul 24 '14 at 18:18
  • oh, that one. but why not? – qed Jul 24 '14 at 18:19
  • @DieterLücking I just found creating an exception class with customized error message is too much of a hassle, so I figured, why not just throw a an error message directly? Could you please explain why this is bad? – qed Jul 24 '14 at 18:32

2 Answers2

1

The usual solution for line oriented input is to read line by line, then parse each line:

std::string line;
while ( std::getline( in_file, line ) ) {
    std::istringstream parser( line );
    for ( int i = 1; parser >> tmpword && i <= ncol_select; ++ i ) {
    }
    if ( parser ) {
        colsel.push_back( tmpword );
    }
    //  No need for any ignore.
}

The important thing is that you must absolutely test after the input (be it from in_file or parser) before you use the value. A test before the value was read doesn't mean anything (as you've seen).

luk32
  • 15,812
  • 38
  • 62
James Kanze
  • 150,581
  • 18
  • 184
  • 329
  • what is `if(parser)` testing? – qed Jul 24 '14 at 18:15
  • It's a cast of `std::istringstream` to bool it's [`here`](http://en.cppreference.com/w/cpp/io/basic_ios/operator_bool). – luk32 Jul 24 '14 at 18:21
  • @luk32 that is no cast, but a conversion –  Jul 24 '14 at 18:26
  • Fair enough. But ain't it called type cast operator? E.g: [1](http://stackoverflow.com/questions/8239356/can-a-cast-operator-be-explicit), [2](http://msdn.microsoft.com/en-us/library/ts48df3y.aspx). The difference lies only in being explicit or implicit? – luk32 Jul 24 '14 at 18:33
  • @luk32 Within the standard, at least, a cast is one of several different operators, which cause a conversion. A conversion may occur as the result of a cast (explicit conversion) or otherwise (implicit conversion). – James Kanze Jul 25 '14 at 08:03
  • @qed `parser` is a stream. Streams convert implicitly to `bool` (earlier to `void const*`, which could be used as a `bool`), evaluating to `true` if there has been no error, and `false` otherwise. And while I normally try to avoid implicit conversions, this one is so ubiquito9us that anything else would seem strange. (This is exactly the same conversion which occurs in the `while` condition: `std::getline` returns a reference to the istream, which is implicitly converted to `bool`.) – James Kanze Jul 25 '14 at 08:06
0

Ok, I got it. The last line of the text file does not contain a newline, that's why in_file evaluates to true at the last line.

I think I should calculate the number of lines of the file, then replace while(in_file) with a for loop.

If someone has a better idea, please post it and I will accept.

Update

The fix turns out to be rather simple, just check if tmpword is empty:

vecStr bimReadCol(std::string fn, uint ncol_select) {
    try {
        size_t ncols = bimNCols(fn);
        if(ncol_select < 1 or ncol_select > ncols) {
            throw std::string("Your column selection is out of range!");
        }

        std::ifstream in_file(fn);
        vecStr colsel; // holds the column of strings
        std::string tmpword;
        while (in_file) {
            tmpword = "";
            for(int i=1; i<=ncol_select; i++) {
                in_file >> tmpword;
            }
            if(tmpword != "") {
                colsel.push_back(tmpword);
            }
            in_file.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
        }
        return colsel;

    } catch (const std::string& e) {
        std::cerr << "\n" << e << "\n";
        exit(EXIT_FAILURE);
    }
}

As @James Kanze has pointed out, even if the last line contains a newline, in_file would still evaluate to true, but since we are at the end of file, the next reading into tmpword will be empty, so we will be fine as long as we check that.

qed
  • 22,298
  • 21
  • 125
  • 196
  • 1
    More likely, the last line of the text file does contain a new line. The problem you're encountering is that ignoring up to the next `'\n'` doesn't set any error status, even if you reach the end of file. The error is only set once you try to read something beyond the end of file. – James Kanze Jul 24 '14 at 17:55