12

I know ANSI C defines fopen, fwrite, fread, fclose to modify a file's content. However, when it comes to truncating a file, we have to turn to OS specific function, e.g, truncate() on Linux, _chsize_s_() on Windows. But before we can call those OS specific functions, we have to obtain the file-handle from FILE pointer, by calling fileno, also an non-ANSI-C one.

My question is: Is it reliable to continue using FILE* after truncating the file? I mean, ANSI C FILE layer has its own buffer and does not know the file is truncated from beneath. In case the buffered bytes is beyond the truncated point, will the buffered content be flushed to the file when doing fclose() ?

If no guarantee, what is the best practice of using file I/O functions accompanied with truncate operation when write a Windows-Linux portable program?

Similar question: When querying file size from a file-handle returned by fileno , is it the accurate size when I later call fclose() -- without further fwrite()?

[EDIT 2012-12-11]

According to Joshua's suggestion. I conclude that current possible best practice is: Set the stream to unbuffered mode by calling setbuf(stream, NULL); , then truncate() or _chsize_s() can work peacefully with the stream.

Anyhow, no official document seems to explicitly confirm this behavior, whether Microsoft CRT or GNU glibc.

Jimm Chen
  • 3,411
  • 3
  • 35
  • 59
  • doesn't fileno return the fd of the file rather than the size? – fadedreamz Dec 08 '12 at 18:46
  • @fadedreamz: you are correct that fileno() returns handle. Furthermore, handle can be used to query size based on an fstat() call on most systems (last I checked on Windows, the C library provided a working fstat() that called GetFileInformationByHandle). The answer turns out to be the size there can be changed by fclose() unless the stream is unbuffered. – Joshua Dec 09 '12 at 01:49

2 Answers2

8

The POSIX way....

ftruncate() is what you're looking for, and it's been in POSIX base specifications since 2001, so it should be in every modern POSIX-compatible system by now.

Note that ftruncate() operates on a POSIX file descriptor (despite its potentially misleading name), not a STDIO stream FILE handle. Note also that mixing operations on the STDIO stream and on the underlying OS calls which operate on the file descriptor for the open stream can confuse the internal runtime state of the STDIO library.

So, to use ftruncate() safely with STDIO it may be necessary to first flush any STDIO buffers (with fflush()) if your program may have already written to the stream in question. This will avoid STDIO trying to flush the otherwise unwritten buffer to the file after the truncation has been done.

You can then use fileno() on the STDIO stream's FILE handle to find the underlying file descriptor for the open STDIO stream, and you would then use that file descriptor with ftruncate(). You might consider putting the call to fileno() right in the parameter list for the ftruncate() call so that you don't keep the file descriptor around and accidentally use it yet other ways which might further confuse the internal state of STDIO. Perhaps like this (say to truncate a file to the current STDIO stream offset):

/*
 * NOTE: fflush() is not needed here if there have been no calls to fseek() since
 * the last fwrite(), assuming it extended the length of the stream --
 * ftello() will account for any unwritten buffers
 */
if (ftruncate(fileno(stdout), ftello(stdout)) == -1) {
        fprintf(stderr, "%s: ftruncate(stdout) failed: %s\n", argv[0], strerror(errno));
        exit(1);
}
/* fseek() is not necessary here since we truncated at the current offset */

Note also that the POSIX definition of ftruncate() says "The value of the seek pointer shall not be modified by a call to ftruncate()", so this means you may also need to use use fseek() to set the STDIO layer (and thus indirectly the file descriptor) either to the new end of the file, or perhaps back to the beginning of the file, or somewhere still within the boundaries of the file, as desired. (Note that the fseek() should not be necessary if the truncation point is found using ftello().)

You should not have to make the STDIO stream unbuffered if you follow the procedure above, though of course doing so could be an alternative to using fflush() (but not fseek()).

Without POSIX....

If you need to stick to strict ISO Standard C, say C99, then you have no portable way to truncate a file to a given length other than zero (0) length. The latest draft of C11 that I have says this in Section 7.21.3 (paragraph 2):

Binary files are not truncated, except as defined in 7.21.5.3. Whether a write on a text stream causes the associated file to be truncated beyond that point is implementation-defined.

(and 7.21.5.3 describes the flags to fopen() which allow a file to be truncated to a length of zero)

The caveat about text files is there because on silly systems that have both text and binary files (as opposed to just plain POSIX-style content agnostic files) then it is often possible to write a value to the file which will be stored in the file at the position written and which will be treated as an EOF indicator when the file is next read.

Other types of systems may have different underlying file I/O interfaces that are not compatible with POSIX while still providing a compatible ISO C STDIO library. In theory if such a system offers something similar to fileno() and ftrunctate() then a similar procedure could be used with them as well, provided that one took the same care to avoid confusing the internal runtime state of the STDIO library.

With regard to querying file size....

You also asked whether the file size found by querying the file descriptor returned by fileno() would be an accurate representation of the file size after a successful call to fclose(), even without any further calls to fwrite().

The answer is: Don't do that!

As I mentioned above, the POSIX file descriptor for a file opened as a STDIO stream must be used very carefully if you don't want to confuse the internal runtime state of the STDIO library. We can add here that it is important not to confuse yourself with it either.

The most correct way to find the current size of a file opened as a STDIO stream is to seek to the end of it and then ask where the stream pointer is by using only STDIO functions.

Greg A. Woods
  • 2,663
  • 29
  • 26
  • Thank you for your info. You reasoning of using fflush() and fseek() is rational. But there still seems to exist no standard doc that explicitly refer to these requirements, right? I have checked http://linux.die.net/man/2/ftruncate and http://pubs.opengroup.org/onlinepubs/009695399/functions/ftruncate.html and get some bits. – Jimm Chen Dec 13 '12 at 15:08
  • I'm not sure what you mean by "requirements". ISO C defines STDIO and makes it clear that a STDIO stream may be buffered, and has a position in the file. POSIX follows ISO C strictly. POSIX adds `fileno()` and `ftruncate()`. So, if you're able to use POSIX then you can truncate an open file to a specified length. However since you want to re-use your STDIO stream, then you know by implication that you must do something to avoid having to worry about the buffering and positioning of that stream. Ergo you `fflush()` it and reposition (`fseek()`) it after truncation and away you go! – Greg A. Woods Dec 13 '12 at 18:08
  • Thank you. You reiterated the "requirement" again in your previous comment. It would be great if POSIX had said so. POSIX does not emphasize the "requirement" that *we need to call fflush() before ftruncate(),* right? – Jimm Chen Dec 15 '12 at 11:40
  • The "requirement" is only driven by the programmer's desire not to lose data and the knowledge that such loss may be possible. Nothing in the standard mandates that `fflush()` _ever_ be called on buffered streams. Indeed I don't think any standard should mandate such a thing, though perhaps if such a strange "requirement" to truncate open streams and continue to try to use them were ever to become a common programming idiom, then perhaps a future standard might explicitly mention the possibility of such data loss. It's just not "standard" to do what you want to do! – Greg A. Woods Dec 15 '12 at 11:49
  • Indeed the whole idea of mixing operations with a file descriptor and STDIO file streams using the same descriptor, is always frowned upon. However when it is necessary, such as when STDIO doesn't offer the kind of operation or control that is only possible with the underlying file descriptor, then one is force to use `fileno()`, and to be careful not to confuse the STDIO layer. Such care takes greater understanding of the semantics of the STDIO layer, such as understanding concepts such as buffering and file positioning. Such understanding leads to knowing when to call `fflush()`. :-) – Greg A. Woods Dec 15 '12 at 11:55
  • POSIX really already says all it needs to say with respect to what the programmer should be wary of when using `ftruncate()`, notably in this context by saying _The value of the seek pointer shall not be modified by a call to ftruncate()._ Everything else you need to know about STDIO streams is already said by POSIX (and ISO C), including the fact that data from `fwrite()` may be buffered in memory and that you can call `fflush()` or `fclose()` if you want that data to be written to the file. As I said in my first comment, the implications of these defined behaviours should be obvious. – Greg A. Woods Dec 15 '12 at 12:02
  • Now you've made it clear, great. I suggest you integrate your last two comments into your answer so that I can accept it. – Jimm Chen Dec 15 '12 at 12:04
  • OK, I think maybe I have clarified the things that may have been confusing in my original answer. – Greg A. Woods Dec 16 '12 at 00:13
  • Good answer. BUT WAIT... I noticed that you said in C comment *"fflush() is not needed here -- ftello() accounts for unwritten buffers"* . Do you mean that "fflush() is not needed" is applicable only to stdout??? To my knowledge, stdout, like any normal ``FILE*`` pointer, may point to any file stream corresponding to to disk file. Then how can you omit ``fflush()`` ? – Jimm Chen Dec 16 '12 at 11:36
  • How did you drag `stdout` into there as if it had some special meaning? My comment text does not mention `stdout`. `fflush()` is not needed _IFF_ you use `ftello()` to specify the point of truncation. Nothing else need be mentioned. That's an example only -- read the text proper to get the full understanding. – Greg A. Woods Dec 16 '12 at 22:05
  • I almost accept your answer, Greg, but your code snippet referring to stdout and saying *ftello() accounts for unwritten buffers* makes me baffled again. In case you don't realize it, see my screen shot on your post: http://i.stack.imgur.com/RtbXf.png . Where do you get the idea of "ftello() accounts for unwritten buffers", does it apply to stdout only or arbitrary buffered FILE stream? – Jimm Chen Dec 18 '12 at 14:09
  • Do I have to explain all of STDIO to you? Given the nature of your question I would have thought you had a fair understanding of C and STDIO already. If you're missing some knowledge of details then surely you must have some ability to do your own research to find answers beyond the most strict information necessary to answer your original question. I'm already giving you a hint well beyond the strict answer to your question as to one case where `fflush()` is not needed. You must keep in mind too that sometimes standards give you information by explicitly omitting obvious points. – Greg A. Woods Dec 18 '12 at 19:21
  • There may be literal context contradiction in your current answer so I cannot convince myself I fully understand your meaning. See my comment again please on the image: http://i.stack.imgur.com/1MTeS.png – Jimm Chen Dec 19 '12 at 00:52
  • I don't understand your confusion. You know that STDIO streams _may_ have writes buffered in memory allocated by libc in the process's heap, right? You know that that buffer will only have something in it if a write was done, don't you? That's the first "may". You know also that `ftello()` points to the _current_ position in the stream, thus including any still buffered output, right? Therefore by deduction we can conclude that if we are truncating to the position given by `ftello()` then we need not do the fflush() immediately. That's the "need not" part. Read the paragraph after too! – Greg A. Woods Dec 19 '12 at 06:55
  • [20121219-2059] What? *if we are truncating to the position given by ftello() then we **need not** do the fflush()* ??? Assume this scenario: ① A FILE object(``fp`` points to it) is buffering byte offset 512~1023, some of which is dirty, ② user fseek to offset 512, ③ call ``ftruncate(fileno(fp), ftello(fp));`` (user intends to abandon all bytes beyond offset 512, abandon the dirty bytes as well). ④ ``fclose(fp);`` . Q: Will fclose() flush the dirty bytes be flushed to file so the resulting file is longer than 512? – Jimm Chen Dec 19 '12 at 13:14
  • [20121219-2114] My deduction is: if user do not ``fflush(fp)`` between ① and ②, the resulting file has good reason to be larger than 512 -- because the POSIX standard does not made it clear. – Jimm Chen Dec 19 '12 at 13:14
  • Like I said, there's no sense in explaining _all_ of STDIO in the answer to one specific question about how to truncate a file. I can add to the comment in the example code to point out this further assumption. – Greg A. Woods Dec 20 '12 at 00:54
3

Isn't an unbuffered write of zero bytes supposed to truncate the file at that point?

See this question for how to set unbuffered: Unbuffered I/O in ANSI C

Community
  • 1
  • 1
Joshua
  • 40,822
  • 8
  • 72
  • 132
  • I consulted the C99 pdf(section 7.19.2, 7.19.3, 7.19.8.2), it does not mention the "truncate by fwrite-ing 0 byte" behavior, so, that should belong to the OS-handle layer behavior. So it means, even if fwrite() 0 byte calls low layer write() and causes the file to be truncated, ``FILE`` layer is still not aware of that situation. That's where my baffle comes. – Jimm Chen Dec 10 '12 at 01:10
  • As you implied, setting the FILE stream to be unbuffered **before a truncate** may be the best bet. Thank you. – Jimm Chen Dec 10 '12 at 01:22
  • @Jimm Chen: I have access to older specifications. It was not directly specified as it shall do but as a warning not to write zero bytes to an unbuffered file because of the truncation behavior. – Joshua Dec 10 '12 at 03:23
  • Your last comment seems ambiguous, what do you mean by "because of the truncation behavior" ? – Jimm Chen Dec 10 '12 at 03:56
  • It reads along the lines of "don't do X because Y will happen to you" rather than "do X to cause Y". – Joshua Dec 10 '12 at 04:25
  • OK. I got it. You mean: Do not write zero bytes to an unbuffered file stream because your file MAY abruptly get truncated. – Jimm Chen Dec 11 '12 at 00:25
  • Writing zero bytes to a disk file has traditionally had no effect at all, no matter whether it was done at the stdio level, or the system call level. The only system where anything special happens is with the PC-DOS BIOS using INT 21h Function 40h, and even that is a bit magic as it's not clearly specified in most documents. (You're supposed to use INT 21h Function 16h or 3Ch to truncate a file.) – Greg A. Woods Dec 11 '12 at 03:36