2

I have two text files, which appear to be identical in a text editor, but my C++ code for reading the files produces different line-counts for each file. I can't figure out where the files are different, or how to accommodate such difference in my C++ code.

Let me explain...

I have two text files, d1.txt and d2.txt. Each contains 100 numbers, 1 per line. When I open either of the files in vim and enter :set list!, there are only 100 lines, each containing a number and the end-of-line character ($) after the last number on each line. In other words, when looking at them in vim, they look identical, with the exception of different precision in the numbers. There is different precision because one file came from MATLAB and the other from Gnumeric.

A quick diff of the files renders the following output (I use braced elipses "[...]" to omit portions in the interest of space):

1,28c1,28
< 0.01218465532007
       [...]
< 0.01327976337895
---
> 0.0121846553200678
       [...]
> 0.0132797633789485
30,100c30,100
< 0.01329705254301
       [...]
< 0.00017832496354
---
> 0.0132970525430057
       [...]
> 0.000178324963543758
\ No newline at end of file

Despite the message about the absence of a newline at the end of the second file (d2.txt), I can't see any difference when examining the last lines of the files in vim, as I mentioned above.

I have created a C++ function readVectorFromFile(std::vector<double>&,const string) that returns the number of lines read from the respective text file. When I read the text files using the code:

std::cout << "d1.txt has " << readVectorFromFile(v1,"./d1.txt") << " lines.\n";
std::cout << "d2.txt has " << readVectorFromFile(v1,"./d1.txt") << " lines.\n";

I get the output:

d1.txt has 99 lines.
d2.txt has 100 lines.

The function is defined in the following way:

int readVectorFromFile(vector<double>& vec, const string& fullFilePathName) {

    int value, numLines;
    char line[10000];
    ifstream inFile;

    /* attempt to open file */
    inFile.open(fullFilePathName.c_str());
    if (inFile.fail()) {
        LOG(FATAL) << "Unable to open file \"" << fullFilePathName.c_str() << "\" for reading.";
    } else {
        cout << "Importing vector from file " << fullFilePathName.c_str() << "\n";
    }

    /* records the number of lines in the input file */
    numLines = static_cast<int>( count(istreambuf_iterator<char>(inFile),
                                       istreambuf_iterator<char>(), '\n') );

    /* start file over from beginning */
    inFile.clear();
    inFile.seekg(0, ios::beg);

    vec.clear(); // clear current vec contents
    vec.reserve(numLines);

    /* read value from each line of file into vector */
    for(int i=0; i<numLines; ++i) {
        inFile.getline(line, 10000);
        vec.push_back( strtod(line,NULL) );
    }

    inFile.close(); // close filestream

    return numLines; // return the number of lines (values) read

}

Why can I not see the difference between these files when I view them in vim? Is there anything fundamentally wrong with the above function that is causing this problem?

synaptik
  • 8,971
  • 16
  • 71
  • 98
  • Add a newline to d1.txt. Vim renders a newline even if the file doesn't end in one. And your line count only counts newlines. So its not surprising that the line counts are off by one. – FDinoff Aug 26 '13 at 00:42
  • The above function will *not* be kind if your file has any double line breaks or blank lines. I *think* I understand what you're trying to do, and I think this could be simplified significantly. – WhozCraig Aug 26 '13 at 00:45
  • @WhozCraig Yes, that function is just for my own personal use, and it depends on the assumptions of no double line breaks or blank lines. – synaptik Aug 26 '13 at 00:47
  • @FDinoff is there any way to force vim to display all ASCII characters including new lines? Seems like there should be, since vim is so powerful and customizable. – synaptik Aug 26 '13 at 03:54
  • Actually it seems vim adds a new line to the end of the file it was missing one before (http://stackoverflow.com/questions/1050640/vim-disable-automatic-newline-at-end-of-file) (And the newline will be there if you resave the file). And `:set list` is probably the closest you can get. – FDinoff Aug 26 '13 at 03:57

1 Answers1

2

Based on your description, there is just no newline at the end of one of the two files. You can have a look at the files using, e.g., od -c file | less to see the exact content of the file, including their character codes.

That said, your approach to reading lines can probably be improved: Just read a line, check if it could be read, and process it. This way, there is no need to count the number of line endings up front:

for (std::string line; std::getline(inFile, line); ) {
    vec.push_back(strtod(line.c_str()));
}

Personally, I would probably just read the numbers in the first place, e.g.:

for (double value; inFile >> value; ) {
    vec.push_back(value);
}

Well, that's not really the way to read a sequence of doubles into a vector but this is:

std::vector<double> vec((std::istream_iterator<double>(inFile)),
                        std::istream_iterator<double>());

(instead of the extra parenthesis, you could use uniform initialization notation in C++11).

Dietmar Kühl
  • 150,225
  • 13
  • 225
  • 380
  • +1 If one-per-line is truly a requirement and there are more than that per line, its a little more involved (obviously) but not by much. Nice answer. – WhozCraig Aug 26 '13 at 00:52
  • In your 3rd solution using istream_iterator, how could it be modified to read into the std::vector that is passed by reference into the function? (Instead of declaring a new such object inside the function) I'm just curious. – synaptik Aug 26 '13 at 03:50
  • 1
    @synaptik: Have a look at the `std::vector` interface! You could, e.g., `v.swap(std::vector(begin, end));` or `v.assign(begin, end)` (where `begin` and `end` are just the corresponding `std::istream_iterator` objects). – Dietmar Kühl Aug 26 '13 at 03:52
  • That's very cool. Yes, I need to RTFM more. Never really tried to get into the more powerful aspects of the STL, but after this SO post, I think I will. – synaptik Aug 26 '13 at 03:58