6

C2x, 7.21.9.2 The fseek function:

Synopsis

#include <stdio.h>
int fseek(FILE *stream, long int offset, int whence);

Why does fseek have long int offset instead of long long int offset?

It seems that on operating systems with data model LLP64 or ILP32 (e.g. Microsoft Windows) the 2147483647 (2 GB) may be insufficient.

Note: POSIX's lseek has off_t offset, where off_t "isn't very rigorously defined".

Boann
  • 48,794
  • 16
  • 117
  • 146
pmor
  • 5,392
  • 4
  • 17
  • 36
  • 3
    That's why every C library usually have 64-bit extensions, to handle 64-bit offsets. MSVC, for example, have [`_fseeki64`](https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/fseek-fseeki64?view=msvc-170). Regarding `lseek`, Linux have [`lseek64`](https://man7.org/linux/man-pages/man3/lseek64.3.html) which uses the guaranteed 64-bit type `off64_t`. – Some programmer dude Feb 07 '22 at 15:08
  • 4
    It's an unfortunate historical precedent. Clearly (at least, with 20/20 hindsight) it would have been better to have defined `fseek` and `ftell` in terms of `off_t`, or something. – Steve Summit Feb 07 '22 at 15:15
  • 1
    We're stuck with these kludges and compromises, seemingly forever. Back in the early seventies, the original `seek` call gave way to `lseek`, as Unix learned how to deal with 32-bit (!) file sizes. Fast forward to today, and we've got this litany of `stat64` and `_fseeki64` and `lseek64` calls. ("`lseek64`" is a particularly ghastly misnomer; it should clearly be "`seek64`" or "`llseek`".) – Steve Summit Feb 07 '22 at 15:16
  • 2
    I can see, some 10 years from now, people asking, *"Why is it `long long int` (64-bit) and not `long long long int` (128-bit)?"* – Adrian Mole Feb 07 '22 at 15:24
  • 2
    @AdrianMole Hopefully will switch to qubits before it happens. – Eugene Sh. Feb 07 '22 at 15:26
  • @SteveSummit `llseek` might be confusing as linux already has `_llseek` which splits a 64 bit offset into two 32 bit args. It might be ghastly but given that we already have `lseek` we probably want to keep `lseek` as _part_ of the replacement name(s). When I'm looking at a code base and asking the question: _Where are all the places seeking is done?_ I'd like to be able to do a `grep` on `lseek` and get a match on either `lseek` or `lseek64` On 64 bit systems `lseek` works by default. For 32 bit, we can do: `#define _LARGEFILE*_SOURCE` and `lseek` works – Craig Estey Feb 07 '22 at 15:47
  • 1
    @SteveSummit Hence the existence of `fseeko` and `ftello` in POSIX-1.2001. – Ian Abbott Feb 07 '22 at 15:52
  • 6
    `long long int` was added in C99, but `fseek` was already defined to use `long int` offsets before C99. – Ian Abbott Feb 07 '22 at 15:58
  • @AdrianMole `long long int` supports 9.22 EB (exabytes). Should be enough for the next 50 years I guess. Example: 1 hour of 512K (sic!) video takes ~400 TB. Not sure though about the 512K video. – pmor Feb 14 '22 at 15:02

1 Answers1

4

The C Standard was formalized in 1990 when most hard drives were smaller than 2 GB. The prototype for fseek() was already in broad use with a long type offset and 32 bits seemed large enough for all purposes, especially since the corresponding system call used the same API already. They did add fgetpos() and fsetpos() for exotic file systems where a simple long offset did not carry all the necessary information for seeking, but kept the fpos_t type opaque.

After a few years, when 64-bit offsets became necessary, many operating systems added 64-bit versions of the system calls and POSIX introduced fseeko() and ftello() to provide a high level interface for larger offsets. These extensions are not necessary anymore for 64-bit versions of common operating systems (linux, OS/X) but Microsoft decided to keep it's long, or more precisely LONG, type at 32-bits, solidifying this issue and other ones too such as size_t being larger than unsigned long. This very unfortunate decision plagues C developers on Win64 platforms ever since and forces them to use non portable APIs for large files.

Changing fseek and ftell prototypes would create more problems with existing software as it would break compatibility, so it will not happen.

Some other historical shortcomings are even more surprising, such as the prototype for fgets:

char *fgets(char * restrict s, int n, FILE * restrict stream);

Why did they use int instead of size_t is a mystery: back in 1990, int and size_t had the same size on most platforms and it did not make sense to pass a negative value anyway. Again, this inconsistent API is here to stay.

chqrlie
  • 131,814
  • 10
  • 121
  • 189
  • 1
    Nice point about `fgets`, although, anybody trying to read a line longer than 32767 characters (or, these days, 2147483647 characters!) probably has other problems, anyway. :-) – Steve Summit Feb 07 '22 at 17:11
  • 1
    @SteveSummit: text files with lines longer than 32K are common place: minified JS files for example. System generated XML files can easily break the 2GB barrier, a challenge for `getline()` users :) – chqrlie Feb 07 '22 at 17:24
  • 2
    "plagues C developers on Win64 platforms ever since and forces them to use non portable APIs for large files." --> I suspect this is a deliberate [choice](https://en.wikipedia.org/wiki/Embrace,_extend,_and_extinguish). – chux - Reinstate Monica Feb 07 '22 at 17:47
  • @chux-ReinstateMonica: we are on the same page :) – chqrlie Feb 07 '22 at 17:50
  • @chqrlie Files with "lines" longer than 32K are not, IMHO, text files, and no sane person (again, IMHO) reads or processes them a line at a time. – Steve Summit Feb 07 '22 at 18:09
  • @SteveSummit Agree about sane people, yet automated systems can and do create exceptionally long strings and it is those that stress code. Still I generally agree that any input larger than some N is more likely nefarious than good and deserves error handling rather than allow outside forces to compel code to handle huge strings. – chux - Reinstate Monica Feb 07 '22 at 18:15
  • @chux-ReinstateMonica Re: "a deliberate choice": thanks for the link, interesting! – pmor Feb 07 '22 at 23:24
  • `This very unfortunate decision plagues C developers on Win64 platforms ever since and forces them to use non portable APIs for large files.` same to programmers on 32-bit \*nix. They all have to use other solutions – phuclv Feb 08 '22 at 11:38
  • @phuclv: except programmers that must deal with huge files on 32-bit unix know they are in a legacy world and can use standard POSIX functions. Modern unix systems do not have this issue. – chqrlie Feb 08 '22 at 13:48
  • no that's not a legacy world, there are still plenty 32-bit MCUs running Linux and they'll never disappear – phuclv Feb 09 '22 at 03:04
  • @phuclv: I agree, it is an embedded world where files rarely exceed 2GB, but it is a POSIX world with a standard solution. – chqrlie Feb 09 '22 at 08:36