17

C++ uses the streamoff type to represent an offset within a (file) stream and is defined as follows in [stream.types]:

using streamoff = implementation-defined ;

The type streamoff is a synonym for one of the signed basic integral types of sufficient size to represent the maximum possible file size for the operating system. 287)

287) Typically long long.

This makes sense because it allows for seeking within large files (as opposed to using long, which may be only 32 bits wide).

[filebuf.virtuals] defines basic_filebuf's function to seek within a file as follows:

pos_type seekoff(off_type off, ios_base::seekdir way, ios_base::openmode which = ios_base::in | ios_base::out) override;

off_type is equivalent to streamoff, see [iostreams.limits.pos]. However, the standard then goes on to explain the function's effects. I'm irritated by the very last sentence, which requires a call to fseek:

Effects: Let width denote a_codecvt.encoding(). If is_open() == false, or off != 0 && width <= 0, then the positioning operation fails. Otherwise, if way != basic_ios::cur or off != 0, and if the last operation was output, then update the output sequence and write any unshift sequence. Next, seek to the new position: if width > 0, call fseek(file, width * off, whence), otherwise call fseek(file, 0, whence).

fseek accepts a long parameter. If off_type and streamoff are defined as long long (as suggested by the standard), this could lead to a down conversion to long when calling fseek(file, width * off, whence) (leading to potentially hard to diagnose bugs). This calls into question the whole rationale for introducing the streamoff type in the first place.

Is this intentional or a defect in the standard?

curiousguy
  • 8,038
  • 2
  • 40
  • 58
jceed2
  • 171
  • 3
  • 8
  • I think I see that gcc libstdc++ uses [fseeko64](https://github.com/avsm/src/blob/master/gnu/gcc/libstdc%2B%2B-v3/include/ext/stdio_sync_filebuf.h#L171). – KamilCuk Dec 13 '19 at 20:46
  • 2
    Offhand, it doesn't look like `seekoff` necessarily *uses* `fseek` under the hood. Rather, the (presumably familiar?) behavior of `fseek` is used to explain what `seekoff` is doing. – jjramsey Dec 13 '19 at 20:50
  • @jjramsey This was my impression as well. However, the way it is phrased seems to suggest a requirement rather than an explanation. – jceed2 Dec 13 '19 at 20:58
  • That is in a para describing *effects*. – Peter Dec 13 '19 at 21:02
  • 1
    @jjramsey I agree that the "Effects" part can reasonably be interpreted to mean that it doesn't actually have to call `fseek` as long as it does something with the same effect. But `fseek` with an offset less than `LONG_MIN` or greater than `LONG_MAX` has no effect, so the explanation is at best incomplete, at least for implementations where `streamoff` is wider than `long`. – Keith Thompson Dec 14 '19 at 01:46

1 Answers1

6

I think that the conclusion that you're drawing from this, that there is a mismatch between C++ streams and fseek that will lead to runtime bugs, is incorrect. The situation seems to be:

  1. On systems where long is 64 bits, streamoff is defined as long, and the seekoff function invokes fseek.

  2. On systems where long is 32 bits but the OS supports 64-bit file offsets, streamoff is defined as long long and seekoff invokes a function called either fseeko or fseeko64 that accepts a 64-bit offset.

Here's s snippet from the definition of seekoff on my Linux system:

#ifdef _GLIBCXX_USE_LFS
    if (!fseeko64(_M_file, __off, __whence))
      __ret = std::streampos(ftello64(_M_file));
#else
    if (!fseek(_M_file, __off, __whence))
      __ret = std::streampos(std::ftell(_M_file));
#endif

LFS stands for Large File Support.

Conclusion: While the standard suggests a definition for streamoff that ostensibly conflicts with the requirement that seekoff invoke fseek, library designers understand that they must call the variant of fseek that accepts the full range of offsets that the OS supports.

Willis Blackburn
  • 8,068
  • 19
  • 36
  • @ypnos I didn't downvote and I find this answer useful. I guess someone downvoted because it misses the point. The problem isn't that there are sane implementations which ignore the standard in this regard, the problem is that the standard needs to be ignored in order for the implementation to be sane. – jceed2 Dec 13 '19 at 20:55
  • 6
    `The situation seems to be:` - The situation is that the implementation is not allowed not to call `fseek` in `seekoff`. It must call `fseek`, it doesn't, [standard](http://eel.is/c++draft/filebuf#virtuals-13) says it has to. I can argue that this implementation is invalid. I believe it doesn't answer the question. Och, found [llvm](https://github.com/llvm-mirror/libcxx/blob/master/include/fstream#L950), it calls `fseeko`. – KamilCuk Dec 13 '19 at 21:02
  • Just as an FYI, VC++ calls `_fseeki64` for this function; which also seems to violate what the standard says. – ChrisMM Dec 13 '19 at 21:46
  • 1
    This is a case of the implementers realizing the problem and ignoring the standard. I'm glad they did, but the standard really needs to be fixed. – NathanOliver Dec 13 '19 at 22:31
  • 1
    Some people are taking the standard too literally. It's not demanding that the implementation literally call a function called `fseek`. Elsewhere the standard describes something as being "as if by calling `fseek(...)`." If it cared so much about literally calling `fseek`, that statement would be different. Seriously, what would you do, if you were implementing a C++ library? Would you insist on calling `fseek` with the lower 32 bits of a 64-bit file offset, because a document tells you to? Would your customers thank you for that? – Willis Blackburn Dec 13 '19 at 23:37
  • If we take the standard literally, it would mean that implementing the C++ I/O library on a system that just doesn't have an API called `fseek` would be impossible. I don't think that was the intent. – Willis Blackburn Dec 13 '19 at 23:45
  • If taking the standard literally leads to absurd conclusions, that's a sign that the standard needs to be corrected.It doesn't mean that it's a major problem, particularly if implementers apply common sense rather than doing exactly what the standard says. (`fseek` is part of the C standard library, and therefore part of the C++ standard library, so your second comment doesn't apply.) If the standard elsewhere uses the phrase "as if by calling `fseek()`, then the obvious fix is to use that same phrase here, though I'd also want a note clarifying that `fseek`'s limitations don't apply. – Keith Thompson Dec 14 '19 at 01:44