6

I've a rather curious question, not very practical at all really. The error (reading a binary file in r mode) is in plain sight but I'm confused by something else.

Here's the code-

#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<stdint.h>

#define BUFFER_LEN 512

typedef uint8_t BYTE;

int main()
{
    FILE* memcard = fopen("card.raw", "r");
    BYTE buffer[BUFFER_LEN];
    int count = 0;
    while (fread(buffer, sizeof(*buffer), BUFFER_LEN, memcard) != 0)
    {
        printf("count: %d\n", count++);
    }
    fclose(memcard);
    return 0;
}

Now, card.raw is a binary file, so this reading will go wrong due to being read in r mode instead of rb. But what I'm curious about is that, that loop executes exactly 3 times, in the final execution, it doesn't even read 512 bytes.

Now if I change that loop to

while (fread(buffer, sizeof(*buffer), BUFFER_LEN, memcard) != 0)
{
    printf("ftell: %ld\n", ftell(memcard));
}

It no longer stops at 3 executions. In fact, it keeps going until (presumabely) the end of file. The fread count is still messed up. Many of the reads do not return 512 as elements read. But that is most probably due to the file being opened in r mode and all the encoding errors it's being accompanied with .

ftell shouldn't affect the file itself, then why does including ftell in the loop make it execute more times?

I decided to change the loop a bit more to extract more info-

while ((count = fread(buffer, sizeof(*buffer), BUFFER_LEN, memcard)) != 0)
{
    printf("fread bytes read: %d\n", count);
    printf("ftell: %ld\n", ftell(memcard));
}

This loops just as many times as it would, provided ftell is included in the loop and the first few results look like-

ftell results

Now if I just remove that ftell line completely, it gives me-

without ftell results

Only 3 executions, yet nothing changed.

What's the explanation behind this behaviour?

Note: I know the counts returned by both fread and ftell are probably wrong due to the read mode, that's not my concern though. I'm only curious - why the difference, between including ftell and not including it.

Also, in case it helps, The card.raw file is actually just the cs50 pset4 "memory card". You can get it by wget https://cdn.cs50.net/2019/fall/psets/4/recover/recover.zip and storing the output file in a .zip

Edit: I should mention this was on windows and using clang tools for VS2019. The command line options (checked from VS2019 project properties) looked like-

/permissive- /GS /W3 "Debug\" "Debug\" /Zi /Od "Debug\vc142.pdb" /fp:precise /D "_CRT_SECURE_NO_WARNINGS" /D "_DEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /WX- /Gd /MDd /Fa"Debug\" /EHsc /nologo /Fo"Debug\" /Fp"Debug\Test.pch" /diagnostics:column 

Edit: Also, I did check for ferror inside the loop, with and without ftell, got no errors from it at all. In fact, feof returns 1 after the loop, in both cases.

Edit: I also tried adding a memcard == NULL check right after the fopen, same behaviour.

Edit: To address the answer by @orlp. I did, infact, check for errors. I should definitely have posted it though.

while ((count = fread(buffer, sizeof(*buffer), BUFFER_LEN, memcard)) != 0)
{
    if ((err = ferror(memcard)))
    {           
        fprintf(stderr, "Error code: %d", err);
        perror("Error: ");
        return 1;
    }
    printf("fread bytes read: %d\n", count);
    printf("ftell: %ld\n", ftell(memcard));
}
if ((err = ferror(memcard)))
{
    fprintf(stderr, "Error code: %d", err);
    perror("Error: ");
    return 1;

}

Neither of the 2 if statements are ever triggered.

Edit: I thought we got the answer already, it was ftell resetting the EOF. But I changed the loop to-

while ((count = fread(buffer, sizeof(*buffer), BUFFER_LEN, memcard)) != 0)
{
    if ((err = ferror(memcard)))
    {
        fclose(memcard);
        fprintf(stderr, "Error code: %d", err);
        perror("Error: ");
        return 1;
    }
    if (feof(memcard))
    {
        printf("reached before\n");
    }
    printf("fread bytes read: %d\n", count);
    ftell(memcard);
    if (feof(memcard))
    {
        printf("reached after\n");
    }
}

this triggers both the first if(feof) and the second if(feof)

As expected though, if I change the ftell to fseek(memcard, 0, SEEK_CUR), the EOF is reset and the reached after is never printed.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Chase
  • 5,315
  • 2
  • 15
  • 41
  • When I tried your code, without the `ftell` (the code you show originally), it executes the loop 7313 times. I tried it on Ubuntu using `gcc`. – lurker Jun 10 '20 at 11:10
  • 1
    > Now, `card.raw` is a binary file, so this reading will go wrong due to being read in `r` mode instead of `rb`. No it will not necessarily go wrong. That depends on your platform. – Cheatah Jun 10 '20 at 11:10
  • Yes I should mention it was on windows. Windows seems to have trouble with binary files in non binary mode – Chase Jun 10 '20 at 11:20
  • Could remove one of the _usual suspects_: Check `memcard != NULL`. – chux - Reinstate Monica Jun 10 '20 at 11:20
  • 1
    @chux-ReinstateMonica sorry, I dunno how it ended up there in the question. I did not use `&` in my testing, not to worry. – Chase Jun 10 '20 at 11:20
  • What compiler are you using? What options? – Andrew Henle Jun 10 '20 at 11:27
  • @AndrewHenle just added that to the question, I'm using `clang` for VS2019, the options are just default VS2019 stuff – Chase Jun 10 '20 at 11:28
  • Also just checked `clang` from the command line, `clang ./test.c test.exe`, same behaviour – Chase Jun 10 '20 at 11:30
  • "ftell shouldn't affect the file itself, then why does including ftell in the loop make it execute more times?" --> Bug in compilation/library. – chux - Reinstate Monica Jun 10 '20 at 11:37
  • Just tested on linux with `gcc`, same behaviour across both and no weird `fread`s, it always reads 512 bytes. Wonder if it's `clang` or windows. – Chase Jun 10 '20 at 11:40
  • How does it behave if you add `setbuf( memcard, NULL );` after `fopen()`? – Andrew Henle Jun 10 '20 at 11:41
  • @AndrewHenle just tested, same behaviour. Looks like buffering is not the cause – Chase Jun 10 '20 at 11:44
  • 1
    "in the final execution, it doesn't even read 512 bytes." and "The fread count is still messed up." concern me in that the code `while (fread(buffer, sizeof(*buffer), BUFFER_LEN, memcard) != 0)` does not update `count` so these conclusions are based on something other than the posted code. I recommend to post the exact code used at each step to improve the investigation. – chux - Reinstate Monica Jun 10 '20 at 11:46
  • @chux-ReinstateMonica Those statements are based on the code posted right after. Where `count` is updated according to the return value of `fread` – Chase Jun 10 '20 at 11:51
  • 3
    Without the `ftell`, it stops when it runs into a `0x1a` character which means EOF in Windows text-mode files. I don't know why calling `ftell` would change this. – interjay Jun 10 '20 at 12:06
  • 1
    @interjay Hmmm, very useful comment. – chux - Reinstate Monica Jun 10 '20 at 12:11
  • @interjay that makes a lot of sense. A friend suggested the same too, though they didn't know which exact character it was. – Chase Jun 10 '20 at 12:12
  • Curious, Is `feof()` set after a short read? – chux - Reinstate Monica Jun 10 '20 at 12:15
  • @chux-ReinstateMonica it is!, I believe I mentioned that. `feof` is set in both cases, very strange. – Chase Jun 10 '20 at 12:16
  • I saw the "In fact, feof returns 1 __after__ the loop", wanted to see if code is looping after `feof()` is true. Is `ftell()` incorrectly re-setting the end-of file flag? – chux - Reinstate Monica Jun 10 '20 at 12:18
  • 1
    @chux-ReinstateMonica bingo! Just now I included `feof` inside the loop, and that's precisely it. I does get reached and `ftell` seems to reset it. So the combination of `0x1a` and `ftell` resetting it, that's the issue. Care to explain the entire thing in one answer? – Chase Jun 10 '20 at 12:22

2 Answers2

5

As some commentors pointed out, it ran into an EOF, and ftell actually got rid of that EOF. Why? To find the answer, we have to look inside glibc's source code. We can find the source for ftell::

long int
_IO_ftell (FILE *fp)
{
  off64_t pos;
  CHECK_FILE (fp, -1L);
  _IO_acquire_lock (fp);
  pos = _IO_seekoff_unlocked (fp, 0, _IO_seek_cur, 0);
  if (_IO_in_backup (fp) && pos != _IO_pos_BAD)
    {
      if (_IO_vtable_offset (fp) != 0 || fp->_mode <= 0)
    pos -= fp->_IO_save_end - fp->_IO_save_base;
    }
  _IO_release_lock (fp);
  if (pos == _IO_pos_BAD)
    {
      if (errno == 0)
    __set_errno (EIO);
      return -1L;
    }
  if ((off64_t) (long int) pos != pos)
    {
      __set_errno (EOVERFLOW);
      return -1L;
    }
  return pos;
}
libc_hidden_def (_IO_ftell)

weak_alias (_IO_ftell, ftell)

This is the important line:

pos = _IO_seekoff_unlocked (fp, 0, _IO_seek_cur, 0);

Let's find the source for _IO_seekoff_unlocked:

off64_t
_IO_seekoff_unlocked (FILE *fp, off64_t offset, int dir, int mode)
{
  if (dir != _IO_seek_cur && dir != _IO_seek_set && dir != _IO_seek_end)
    {
      __set_errno (EINVAL);
      return EOF;
    }

  /* If we have a backup buffer, get rid of it, since the __seekoff
     callback may not know to do the right thing about it.
     This may be over-kill, but it'll do for now. TODO */
  if (mode != 0 && ((_IO_fwide (fp, 0) < 0 && _IO_have_backup (fp))
            || (_IO_fwide (fp, 0) > 0 && _IO_have_wbackup (fp))))
    {
      if (dir == _IO_seek_cur && _IO_in_backup (fp))
    {
      if (_IO_vtable_offset (fp) != 0 || fp->_mode <= 0)
        offset -= fp->_IO_read_end - fp->_IO_read_ptr;
      else
        abort ();
    }
      if (_IO_fwide (fp, 0) < 0)
    _IO_free_backup_area (fp);
      else
    _IO_free_wbackup_area (fp);
    }

  return _IO_SEEKOFF (fp, offset, dir, mode);
}

Basically, it just does some checks then calls _IO_SEEKOFF, so let's find its source:

/* The 'seekoff' hook moves the stream position to a new position
   relative to the start of the file (if DIR==0), the current position
   (MODE==1), or the end of the file (MODE==2).
   It matches the streambuf::seekoff virtual function.
   It is also used for the ANSI fseek function. */
typedef off64_t (*_IO_seekoff_t) (FILE *FP, off64_t OFF, int DIR,
                      int MODE);
#define _IO_SEEKOFF(FP, OFF, DIR, MODE) JUMP3 (__seekoff, FP, OFF, DIR, MODE)

So basically, ftell ends up calling a function which is the equivalent of fseek(fp, 0, SEEK_CUR). And in the fseek standards we see: "A successful call to the fseek() function clears the end-of-file indicator for the stream." That's why ftell changes the behavior of the program.

Aplet123
  • 33,825
  • 1
  • 29
  • 55
  • Hmmm, as the C standard has no mention of `ftell()` clearing the _end-of-file indicator_, this appears to be non-compliant behavior. The `fread()` call after a short read should return 0 as the _end-of-file indicator_ should still be set. – chux - Reinstate Monica Jun 10 '20 at 12:38
  • Interesting, it seems like `ftello` is allowed to reset the error on a file, but not `ftell`...? https://pubs.opengroup.org/onlinepubs/9699919799/functions/ftell.html – Chase Jun 10 '20 at 12:42
  • I do not find support for "ftello is allowed to reset the error on a file". Both `ftello()` and `ftell()` can set `errno`, but that is not the _error indicator_ for the stream as reported by `ferror()`. IAC, there is no implied ability for either function to clear the end-of-file indicator. – chux - Reinstate Monica Jun 10 '20 at 12:51
  • Bad news, `ftell` isn't resetting the EOF - I put another `if(feof(memcard))` after the `ftell`, and it still got triggered. `fread` is somehow continuing reading even after that "fake" EOF, We're back at square one..? – Chase Jun 10 '20 at 13:21
1

fread() has

The fread function returns the number of elements successfully read, which may be less than nmemb if a read error or end-of-file is encountered.

When count < BUFFER_LEN, OP reported feof() was true - as expected.

What is unexpected is that a following fread() returns non-zero.

IMO, a non-compliant library.

(OP reports new info, so this answer now incomplete.)

It appears ftell(), incorrectly IMO, reset the end-of-file indicator for the stream, allowing additional reads to occur.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • Bad news, `ftell` isn't resetting the EOF - I put another `if(feof(memcard))` after the `ftell`, and it still got triggered. `fread` is somehow continuing reading even after that "fake" EOF, We're back at square one..? – Chase Jun 10 '20 at 13:20
  • @Chase "ftell isn't resetting the EOF" --> Well that is good as it should not do so. C has "The byte input functions read characters from the stream as if by successive calls to the fgetc function." so after a short `fread()`, the next `fread()` should return 0 as the that call is like 512 `fgetc()`. And `fgetc()` has "**If the end-of-file indicator for the stream is set**, or if the stream is at end-of-file, the end-of-file indicator for the stream is set and the fgetc function returns EOF. Curious, after a short `fread()`, what does `fgetc()` return? and then `feof(), ferror()`? – chux - Reinstate Monica Jun 10 '20 at 13:48
  • Without the `ftell`, i.e shortread at the 3rd iteration, `fgetc` returns -1, `ferror` is still 0 and `feof` is 1 – Chase Jun 10 '20 at 13:58
  • with `ftell`, if I do `fgetc` *before* `ftell`, after a shortread it gives me -1, I do it after `ftell`, and suddenly no -1 – Chase Jun 10 '20 at 13:59
  • With `ftell()`, what is `feof(), ferror()` before and after `fgetc()`? – chux - Reinstate Monica Jun 10 '20 at 14:00
  • `ferror` gives 0, `feof` gives 1, in all situations, before `ftell`, after `ftell`, before `fgetc` and after `fgetc` – Chase Jun 10 '20 at 14:03
  • Hmmm, `ftell()` is clearing _something_ and `fread()` is not compliant. `fread()` should return 0 when `feof()` is true. BTW, with "suddenly no -1", what was the value? 26? – chux - Reinstate Monica Jun 10 '20 at 14:04