1

I've tested the following C code

#include <stdio.h>

int main()
{
    FILE * file = fopen("ans.txt", "r+");
    printf("%ld", ftell(file));  // prints 0
    fgetc(file);
    printf("%ld", ftell(file));  // prints -18
    printf("%d", fseek(file, 0, SEEK_CUR)); // -1
    printf("%ld", ftell(file));  // prints 150
    fclose(file);
    return 0;
}

on win10 with MinGW-W64 (gcc version 7.1.0 (x86_64-posix-seh-rev0, Built by MinGW-W64 project)) and Visual Studio 2017 (cl.exe version Microsoft (R) C/C++ Optimizing Compiler Version 19.11.25547)
The ans.txt file is (lines end in unix style)

line 1
line 2
line 3
line 4
line 5
line 6
line 7
line 8
line 9
line 10
line 11
line 12
line 13
line 14
line 15
line 16
line 17
line 18
line 19
line 20

But everything is right on Arch Linux or when I open the file in binary mode or change line ending style into 'Windows/Mac OS 9'.
Is there anything to do with Windows crt?

Tuff Contender
  • 259
  • 4
  • 11
  • 3
    If you open a file in non-binary mode on Windows it is supposed to have Windows style line endings. – Bo Persson Nov 13 '17 at 02:34
  • 1
    Did you check that you opened the file successfully? The code isn't obliged to crash if it failed. It would be sensible to print newlines (or spaces) after the numbers, too. – Jonathan Leffler Nov 13 '17 at 02:39
  • What is the significance of `fgets()`? Your code doesn't call it? Or, more precisely, the code you are currently asking us to review does not call `fgets()`. Presumably, this means that what you're showing us is not the same as the code you're testing, and the difference contains part of the trouble. We cannot often deduce what's wrong with the code we cannot see. – Jonathan Leffler Nov 13 '17 at 02:45
  • It's tested somewhere else. I mean the program can realise where a line ending is but `ftell` simply returns a wrong postion, which seems unreasonble. And that's not the major problem so I didn't mention it. I'm sorry for not making myself clear. – Tuff Contender Nov 13 '17 at 02:53
  • @BoPersson Thank you for your idea! But since a non-Windows line ended file cannot be handled correctly in text mode, shouldn't it be treated just as if the line didn't end? Then the postion still shouldn't be wrong. – Tuff Contender Nov 13 '17 at 03:01
  • @JonathanLeffler: Presumably if `fopen()` had failed, the following `ftell()` would have returned -1 to indicate an error. But yes, the result of `fopen()` should certainly have been checked. – Keith Thompson Nov 13 '17 at 03:51
  • My testing indicates that the `fseek` call after the first `fgetc` call puts the stream into EOF state if the file content is invalid; but repeated `fgetc` calls without the seek work correctly, and `fseek` works correctly before the first `fgetc` call. `perror` for the `fseek` call gives `Invalid argument` which is not very enlightening – M.M Nov 13 '17 at 03:58
  • The standard doesn't actually say what `fseek` should do for a text stream so it could be argued that this behaviour is compliant. Although bad QoI IMO, I would expect the stream error state to be set – M.M Nov 13 '17 at 04:02
  • @KeithThompson: it would be undefined behaviour. Anything could happen. – Jonathan Leffler Nov 13 '17 at 04:13
  • Your problem starts with `ftell` returning -18. To compute this, the CRT starts with the real file position, part of which was read into the stream buffer but not actually read yet. In text mode, the buffer already has CRLF translated to LF. So it has to assume that an unread LF in its buffer was CRLF on disk. But you don't actually have CRLF on disk, so it ends up subtracting an extra 19 bytes, returning -18 as the computed file stream position. When you do `fseek(file, 0, SEEK_CUR)` it's going to call `SetFilePointerEx` to seek -18 bytes from `FILE_BEGIN`, which fails. – Eryk Sun Nov 13 '17 at 04:17
  • @Tuff Contender: ftell is not returning a wrong position. When the file is opened in text mode, then ftell returns an opaque value that's useful only as an input to a subsequent call to fseek. You cannot expect it to be a byte offset. – Adrian McCarthy Nov 13 '17 at 05:01
  • @JonathanLeffler: You're right. `ftell()` returns -1 on an error, but a null pointer argument isn't an error it's required to detect. – Keith Thompson Nov 13 '17 at 17:44

1 Answers1

3

This is documented on MSDN here:

For streams opened in text mode, fseek and _fseeki64 have limited use, because carriage return-linefeed translations can cause fseek and _fseeki64 to produce unexpected results. The only fseek and _fseeki64 operations guaranteed to work on streams opened in text mode are:

  • Seeking with an offset of 0 relative to any of the origin values.

  • Seeking from the beginning of the file with an offset value returned from a call to ftell when using fseekor _ftelli64 when using _fseeki64.

Open the file in binary mode... and you get more predictable results:

FILE * file = fopen("ans.txt", "rb+");
selbie
  • 100,020
  • 15
  • 103
  • 173
  • Yes. That `fgetc` call prior to `fseek` puts the stream out of whack. I don't have time to debug through the CRT sources to see what's wrong. Given that the original source file is unix style line endings, the OP can likely avoid all this translation stuff and associated side effects by opening as binary instead of text. – selbie Nov 13 '17 at 03:58
  • This is not some weird quirk of Windows. The C standard describes the same limitations on fseek and ftell on files opened in text mode. The return value of ftell is essentially an opaque magic value that's only useful as an input to fseek. – Adrian McCarthy Nov 13 '17 at 04:56
  • @selbie, it's unrelated to `fgetc` in general. To get the current position the CRT has to get the OS file position and subtract off the unread bytes from the stream buffer. To do this it has to assume an LF in the buffer is CRLF on disk, since text in the stream's buffer is already translated. The assumption is wrong for a file with Unix-style LF line endings, and we end up attempting to seek to a negative file position, which causes `SetFilePointerEx` to fail. – Eryk Sun Nov 13 '17 at 05:42
  • @eryksun - May I suggest you write your own answer? I believe you have a different and perhaps better perspective than I do. I would be likely to upvote your answer as well. – selbie Nov 13 '17 at 09:18
  • @selbie, you have the only practical solution -- use binary mode. – Eryk Sun Nov 13 '17 at 10:47