0

I had a strange bug with file access (windows) in my program, where some file index calculations would not point to the correct symbol in the file. I could isolate the error the seek incrementation behavior and tested it in two loops. Once in the C and C++ way.

In both, we open a file and then print each character together with its string representation. I wrote a function to_string(char) that prints the character or its name it's a whitespace or something invisible.

First the C way:

    FILE* fptr = fopen("test.txt", "r");
    while(!feof(fptr)){
        int seek = ftell(fptr);
        char c = fgetc(fptr);
        const char* s = to_string(c);
        printf("seek: %d, char: %s\n", seek, s); //Edit output is now in C not in C++
    }
    fclose(fptr);

In the following output, you will notice that the seek jumps at every 'LF' so that some seeks are skipped. So the seek 19, 34, 37 and 39 somehow do not appear. I also tried using the fread() function instead of fgetc() but the output does not change.

Output:
seek: 0, char: f
seek: 1, char: n
seek: 2, char: 'SP'
seek: 3, char: m
seek: 4, char: a
seek: 5, char: i
seek: 6, char: n
seek: 7, char: 'SP'
seek: 8, char: -
seek: 9, char: >
seek: 10, char: 'SP'
seek: 11, char: a
seek: 12, char: s
seek: 13, char: c
seek: 14, char: i
seek: 15, char: i
seek: 16, char: 'SP'
seek: 17, char: {
seek: 18, char: 'LF'    <--- 
seek: 20, char: 'SP'    <--- what is this
seek: 21, char: 'SP'
seek: 22, char: 'SP'
seek: 23, char: 'SP'
seek: 24, char: r
seek: 25, char: e
seek: 26, char: t
seek: 27, char: u
seek: 28, char: r
seek: 29, char: n
seek: 30, char: 'SP'
seek: 31, char: 1
seek: 32, char: ;
seek: 33, char: 'LF'   <---
seek: 35, char: }      <---
seek: 36, char: 'LF'   <---
seek: 38, char: 'LF'   <---
seek: 40, char: 'EOF'

Now the C++ way:

    std::ifstream file("test.txt");
    while(!file.eof()){
        int seek = file.tellg();
        char c = file.get();
        std::cout << "seek: " << std::to_string(seek) << ", char: " << to_string(c) << std::endl;
    }

In the following output, you will see that the seek will not jump by two at every occurence of LF but from the first to the second element the seek randomly jumps by 5 instead of incrementing by one.

Output
seek: 0, char: f    <---
seek: 5, char: n    <--- why?
seek: 6, char: 'SP'
seek: 7, char: m
seek: 8, char: a
seek: 9, char: i
seek: 10, char: n
seek: 11, char: 'SP'
seek: 12, char: -
seek: 13, char: >
seek: 14, char: 'SP'
seek: 15, char: a
seek: 16, char: s
seek: 17, char: c
seek: 18, char: i
seek: 19, char: i
seek: 20, char: 'SP'
seek: 21, char: {
seek: 22, char: 'LF'
seek: 23, char: 'SP'
seek: 24, char: 'SP'
seek: 25, char: 'SP'
seek: 26, char: 'SP'
seek: 27, char: r
seek: 28, char: e
seek: 29, char: t
seek: 30, char: u
seek: 31, char: r
seek: 32, char: n
seek: 33, char: 'SP'
seek: 34, char: 1
seek: 35, char: ;
seek: 36, char: 'LF'
seek: 37, char: }
seek: 38, char: 'LF'
seek: 39, char: 'LF'
seek: 40, char: 'EOF'

Do you have a clue on why those file streams behave so strangely?

Thanks for your help.

  • Please don't tag C++ questions as C. That "C way" code is obviously, and only C++. It's understood that C++ allows coding "C style" using C functions, so separate tagging is not necessary. – tadman May 13 '21 at 18:14
  • 1
    Looks like your file uses CRLF encoding, but C++ folds that down to just LF for convenience. Open in binary mode if you care about the details. Windows also has some quirky non-standard file type identification bytes it jams in at the start of files. – tadman May 13 '21 at 18:16
  • 1
    [Why is “while ( !feof (file) )” always wrong?](https://stackoverflow.com/questions/5431941/why-is-while-feof-file-always-wrong) – Some programmer dude May 13 '21 at 18:18
  • And why `std::to_string(seek)`? The output operators `<<` have several overloads that can handle numeric types well. – Some programmer dude May 13 '21 at 18:19
  • 3
    Playing `tellg` games on files opened in text mode can give surprising results. – PaulMcKenzie May 13 '21 at 18:19
  • Oh and please read [the help pages](http://stackoverflow.com/help), especially ["What topics can I ask about here?"](http://stackoverflow.com/help/on-topic) Your question about "other libraries" is off-topic. – Some programmer dude May 13 '21 at 18:21
  • Ok thank you for your quick answers. @ tadman I changed it to full C and re-compiled with C ... same output though. I will try it in binary mode and give an update. thx – Tobias Wallner May 13 '21 at 18:21
  • 1
    `ftell` and `fseek` are only going to give consistent results on binary files. – Mark Ransom May 13 '21 at 18:23
  • @Some programmer dude std::to_string(seek) or << is not the point of my question – Tobias Wallner May 13 '21 at 18:26
  • @PaulMcKenzie noticed that – Tobias Wallner May 13 '21 at 18:26
  • @MarkRansom Thanks will try it in binary mode. – Tobias Wallner May 13 '21 at 18:26
  • @Someprogrammerdude Thanks for pointing out that this is an inappropriate question. I removed it. – Tobias Wallner May 13 '21 at 18:30
  • @bradgonesurfing Seems like the article you are pointing out refers to the same problem. – Tobias Wallner May 13 '21 at 18:32
  • I figured that is so. Seems like tellg returns *something* that is a token but there is no guarantee that the value represents a count of character positions just that the token identifies a unique character position. Subtle difference but significant. – bradgonesurfing May 13 '21 at 18:35
  • Thank you for all your answers. I just tried it in binary mode and as some expected 'CR' + 'LF' is magically folded to 'LF'. – Tobias Wallner May 13 '21 at 18:41
  • @TobiasWallner You should open the file in a binary editor. Binary mode is basically what you see in the binary/hex editor. – PaulMcKenzie May 13 '21 at 18:43

0 Answers0