1

I am playing with ifstream to familiarize myself with it. I am trying to use seekg to tell the position of the file, but it is giving me wrong results.

The idea is to:

  1. open the file
  2. print position of the file
  3. read a character from the file
  4. print position of the file
  5. read a character from the file
  6. print position of the file
  7. close the file.

The original file looks like this(windows format):

file.txt

aA
bB
cC
dD
eE
fF

running my code, I get the results:

position: 0
got: a
position: 6
got: A
position: 7

However, for this file:

file.txt

aAbBcCdDeEfF

I get these results

position: 0
got: a
position: 1
got: A
position: 2

Here is the code I used:

test.cpp(mingw/gcc5.3)

#include <fstream>
#include <iostream>

using namespace std;

static char s[10];

int main(int argc, char **argv)
{
    ifstream f("file.txt");    
    cout << "position: " << f.tellg() << "\n";
    f.read(s, 1);
    cout << "got: " << s << "\n";
    cout << "position: " << f.tellg() << "\n";
    f.read(s, 1);
    cout << "got: " << s << "\n";
    cout << "position: " << f.tellg() << "\n";    
    f.close();

    return 0;
}

Here are the two hex editor views of the two text files respectively:

original: enter image description here modified: enter image description here

I expected both to yield the results 0, 1, 2 respectively, however this was not the case for the original experiment.

Can somebody explain what is happening here?

Questions:

  1. What should I do to get the correct file position?

Answer: use ifstream("file.txt", ios_base::in | ios_base::binary) constructor over ifstream("file.txt") constructor.

  1. What is causing f.tellg to give these strange values 0,6,7 instead of the expected 1,2,3 by default?

possible explanation(testing an answer by Holt below)

f.tellg in this code resorts to f.rdbuf()->pubseekoff(0, ios_base::cur, ios_base::in) which is responsible for producing the values 0, 6, 7(but only if ios_base::binary is not specified at construction/open).

#include <fstream>
#include <iostream>

using namespace std;

static char s[10];

int main(int argc, char **argv)
{
    ifstream f("file.txt");    
    cout << "position: " << f.rdbuf()->pubseekoff(0, ios_base::cur, ios_base::in) << "\n";
    f.read(s, 1);
    cout << "got: " << s << "\n";
    cout << "position: " << f.rdbuf()->pubseekoff(0, ios_base::cur, ios_base::in) << "\n";
    f.read(s, 1);
    cout << "got: " << s << "\n";
    cout << "position: " << f.rdbuf()->pubseekoff(0, ios_base::cur, ios_base::in) << "\n";
    f.close();

    return 0;
}

Note passing ios::in | ios::binary as the second argument to the ifstream constructor makes both files behave as expected, but I would like to also know what's causing the default behavior to give these strange tellg values.

Note difference from tellg() function give wrong size of file?. That question has ios::binary set by default, and uses seek; this question here has both ios::binary and without, and does not use seek. Overall, the two questions have different contexts, and knowing the answer to that question does not answer this one.

Dmytro
  • 5,068
  • 4
  • 39
  • 50
  • Compare the files in a hex editor. Maybe there's a BOM or some weird unseen crap at the start of the file, that the OS reads and skips for you (unless you open the file in binary mode). – tambre May 31 '17 at 17:56
  • added the hex views at the bottom. Both first 2 characters are the same, so I don't see how it would impact seekg. – Dmytro May 31 '17 at 18:00
  • What OS and compiler are you using? Please include the exact versions. – tambre May 31 '17 at 18:03
  • Windows, mingw, gcc 5.3.0 – Dmytro May 31 '17 at 18:04
  • Maybe you actually need to open file as binary to read byte-by-byte? `std::ifstream f("file.txt", std::ios::binary);` – user7860670 May 31 '17 at 18:04
  • passing `ifstream::in | ifstream::binary`/`ios::in | ios::binary` as second argument to the ifstream constructor indeed fixes this. I am still curious what causes the default mode to behave this way. – Dmytro May 31 '17 at 18:07
  • 5
    @Dmitry When a file is open in text mode, the return value of `tellg()` is unspecified (it is only meant to be used as an argument to `seekg()`). – Holt May 31 '17 at 18:09
  • @Holt Quote the appropriate section in the standard and turn it into an answer! – tambre May 31 '17 at 18:11
  • 2
    Possible duplicate of [tellg() function give wrong size of file?](https://stackoverflow.com/questions/22984956/tellg-function-give-wrong-size-of-file) – gsamaras May 31 '17 at 18:19
  • @gsamaras Yes, though I think the answer here is better! – underscore_d May 31 '17 at 18:21

1 Answers1

6

There is nothing as a "wrong" result for a value returned by tellg(): when the file is opened in text mode, the return value is unspecified (i.e. it has no meaning except that it can be used as input for seekg()).

Basically, a call to tellg() on a basic_fstream falls back to the std::ftell1 function, which says (C standard, §7.21.9.4 [File positioning functions], emphasis is mine):

long int ftell(FILE *stream);

The ftell function obtains the current value of the file position indicator for the stream pointed to by stream. [...] For a text stream, its file position indicator contains unspecified information, usable by the fseek function for returning the file position indicator for the stream to its position at the time of the ftell call; the difference between two such return values is not necessarily a meaningful measure of the number of characters written or read.

1 tellg() falls back to rdbuf()->pubseekoff(0, std::ios_base::cur, std::ios_base::in), which falls back to basic_filebuf::seekoff(0, std::ios_base::cur, std::ios_base::in) which then falls back to std::ftell().

Holt
  • 36,600
  • 7
  • 92
  • 139
  • I tested your answer by going to pubseekoff directly, and it indeed gives the same faulty values but only if the constructor of ifstream is not given `ios_base::in | ios_base::binary`. Still curious what's happening behind the scenes to produce these 1,6,7 results rather than 0, 1, 2. – Dmytro May 31 '17 at 18:41
  • @Dmitry This is highly OS (and compiler) dependent - In particular on Windows, you have some translation in text mode, e.g. for `'\n'`. I tried your code with your text file, and I did not even get the same results as you (I got 7 8 and 2 3). This is probably because I added an extra `'\n'` at the end of each file, which would confirm the impact of the translation of `'\n'` on Windows. – Holt Jun 01 '17 at 05:50