1

For a given file pointer (FILE *), is it possible to rapidly determine the distance from current position to the end of file.

The time it takes to figure out the actual distance should be not dependent to the distance.

For example the subtraction of fpos_t, but fpos_t is not integer, it cannot be operated numerically. Is there any alternative way?

Audra Jacot
  • 139
  • 7

2 Answers2

6

When you first open the file, you can use fseek() to go to the end of the file (but see the note below), then use ftell() to get the position and save this (as the file's size). Then call rewind() to go back to the beginning.

Then, the return value from any later call to ftell() can be subtracted from your saved 'size' to get the offset (distance) from the current position to the file's end:

// Given a FILE* fp that's just been opened:
fseek(fp, 0, SEEK_END);
long int endpos = ftell(fp);
rewind(fp); // Or you can use fseek(fp, 0, SEEK_SET);
//...
// Later in your code:
long int dtoend = endpos - ftell(pf);

But note that implementations are not required to implement SEEK_END: from the cplusplus.com page on fseek linked above:

  • Library implementations are allowed to not meaningfully support SEEK_END (therefore, code using it has no real standard portability).

Just to clarify (following on from some comments): The code above requires that the endpos value be saved for the duration of the file's open/read operations. One could avoid this by seeking to the end and then restoring the current position at any point, but that would be far less efficient. For example, one could write a function to get the distance-to-end at any time:

long int dtoend(FILE *fp)
{
    long int curpos = ftell(fp); // Get current position
    fseek(fp, 0, SEEK_END);      // Go to the file's end
    long endpos = ftell(fp);     // Gets the file's size
    fseek(fp, curpos, SEEK_SET); // Restore previous pos
    return endpos - curpos;      // Return the distance!
}


Note for use with large (> 2GB) files: The above code uses standard fseek and ftell functions that use the long int type (which is often 32-bits wide) for file positions; to use similar code for larger files, there are a number of (albeit platform-specific) alternatives...

On Windows platforms, using the MSVC (or compatible) compilers, there are the _ftelli64 and _fseeki64 functions, which can be used in virtually the same way as their 'standard' counterparts; for example, by making the following changes to the above code:

//...
    int64_t curpos = _ftelli64(fp); // Get current position
    _fseeki64(fp, 0LL, SEEK_END);   // Go to the file's end
    //... and similar changes elsewhere

On Linux systems, the 64-bit calls are implemented as ftello and fseeko if you make sure to #define _FILE_OFFSET_BITS 64.

Other platforms/compilers may implement either (or both) of the above, or have some other, very similar 64-bit replacements.


Note #2 - Error Handling: As pointed out in the comments, calling fseek with SEEK_END as the origin argument can fail in number of different circumstances; for example, if the file pointer is stdin, if it refers to a pipe stream, or (on some systems) if the file is opened in text mode. To handle such cases, one should really check the return value of the fseek call, which will be non-zero if it failed. So, here is a 64-bit version of the dtoend function with such error handling implemented (note for compilers other than MSVC or GNU, you will need to add the relevant definition macros for the bigseek and bigtell functions):
#include <stdio.h>
#include <stdint.h>

#if defined (_MSC_VER) // MSVC/Windows...
    #define bigtell _ftelli64
    #define bigseek _fseeki64
#elif defined (__GNUC__) // GNU/Linux...
    #define _FILE_OFFSET_BITS 64
    #define bigtell ftello
    #define bigseek fseeko
//
// Feel free to add other compiler/platform implementations
//

#else // Unknown platform/compiler - likely to cause warnings!
    #define bigtell ftell
    #define bigseek fseek 
#endif

int64_t dtoend(FILE* fp)
{
    int64_t curpos = bigtell(fp);                   // Saves the file's current position
    if (bigseek(fp, 0LL, SEEK_END) != 0) return -1; // -1 can ONLY be an error condition
    int64_t endpos = bigtell(fp);                   // Retrieve file size (end position)
    bigseek(fp, curpos, SEEK_SET);                  // Restore previously saved position
    return endpos - curpos;                         // Subtract to get distance from end
}

From the same cplusplus.com page linked above:

For streams open in text mode, offset shall either be zero or a value returned by a previous call to ftell, and origin shall necessarily be SEEK_SET. If the function is called with other values for these arguments, support depends on the particular system and library implementation (non-portable).

Cubbi
  • 46,567
  • 13
  • 103
  • 169
Adrian Mole
  • 49,934
  • 160
  • 51
  • 83
3

For example the subtraction of fpos_t, but fpos_t is not integer, it cannot be operated numerically. Is there any alternative way?

No, you can't.

The FILE structure doesn't know how long the file is, and therefore has no easy way to find this distance. The file is treated like a road - you turn into the road, and drive along, and when the road ends, you'll find out. But there are no signs telling you how much longer the road is.

You could ask the OS separately, with stat or similar. But note that FILE does not even always refer to a file with a defined end - it could be stdin coming from a pipe, and the size completely unknown at the time.

alk
  • 69,737
  • 10
  • 105
  • 255
Aganju
  • 6,295
  • 1
  • 12
  • 23
  • "No, you can't." is more like "No, you can't always" as .`fseek(fp, 0, SEEK_END)` does work on many `FILE`. Exceptions include: "Setting the file position indicator to end-of-file, as with fseek(file, 0, SEEK_END), has undefined behavior for a binary stream (because of possible trailing null characters) or for any stream with state-dependent encoding that does not assuredly end in the initial shift state." – chux - Reinstate Monica Apr 17 '20 at 14:13
  • @chux-ReinstateMonica that is not _fast_, as it simply reads the whole file to find its end. I understood the OP's question as looking for a way to abbreviate reading a complete file just to find it's end. – Aganju Apr 17 '20 at 14:57
  • "it simply reads the whole file to find its end" --> that may happen, yet C does not specify that behavior. That is an implementation detail. For streams with known size, the underlying code could simple use `SEEK_END` to indicate a look-up in stream meta- data and return the size. – chux - Reinstate Monica Apr 17 '20 at 15:33