I'm writing a little C++ program for myself. At the begining of it, I read a file all the way to the bottom, and later on, right before the program ends, I need to read that file again from the begining.
My question is, is it more efficient to have the file open during the execution (even thought I won't be using it) and just rewind it when I need it again, or should I close it the first time and then open it again when I need it?
Edit: Just to clarify, my question is not only related to the specific project that I'm working on. It is really small (less than 300 lines of code), so there won't be any noticeable performance difference. I'm asking about opening, closing and "rewinding" files in general, so it's aplicable to other big projects were performance and memory may actually matter

- 499
- 5
- 14
-
2If you have a method, have you tried both and tested the time it takes? – Fantastic Mr Fox Oct 14 '15 at 20:16
-
6First rule of optimizing C++ code: Measure everything. – Baum mit Augen Oct 14 '15 at 20:16
-
If you are not using the file you should probably close it so you are not tying it up. – NathanOliver Oct 14 '15 at 20:17
-
3Why do you read it twice in the first place? if you keep it open - then you're not expecting a change in content... – sara Oct 14 '15 at 20:18
-
1In general, it's faster to leave it open, as a "rewind" usually just resets the file offset. But specifics matter. For what it's worth, since in most cases all the file metadata will be cached, the difference might not even be measurable. – Andrew Henle Oct 14 '15 at 20:18
-
Unfortunately rewinding a file efficiently often involves OS specific methods so the best generic C++ code can do is often to close the file. The way caches interact with the file is implementation specific and rewind versus reopen may vary in performance by OS, YMMV. – Michael Shopsin Oct 14 '15 at 20:29
3 Answers
If you close and open the file, the OS definitely need to update system lock for the file and list of resources (opened files) of your process. Furthermore close and open operation are two systems calls (kernel calls) and system call is not cheap. Every system call require translating of virtual address.
Closing the file can (if there is any change) force writing the cache to the hard-disk, this means seek time about 15ms (physical move of the platter). It can be even worse in the case of network drive.
After closing the file, some properties need to be updated. FileSystem watcher may be launched.
An antivirus scanning may be triggered after closing the file, it depends on filename, path, antivirus brand.
Furthermore closing the file is a risk, that you are not able to open it again because of another process. For example Dropbox read every file in Dropbox folder after change. So closing and opening file does not generally work in Dropbox folder (Dropbox may be faster). And who knows how users use your application. Users are inventive and they share files you didn't think of.

- 1
- 1

- 23,880
- 18
- 111
- 148
You might be able to measure a fraction of gained efficiency in the range of a few nanoseconds if you fseek
to the beginning of the file but I don't think this is worth it when you are only dealing with a single file.
Like others said: try to find other areas of code which you can optimize.

- 10,577
- 10
- 57
- 99
-
Indeed. If ultimate speed is the goal and the file is small, just keep the contents in memory. If the file is too large to keep in memory, the performance difference between keeping the file open compared to reopening the file will almost certainly be smaller than the random fluctuations in how fast the file is read. – Andrew Henle Oct 14 '15 at 20:26
-
1It may well be worth holding on to a file to reduce the risk of errors during the second pass however, depending on the circumstances. Also as a Windows developer who has recently been doing I/O optimizations I fear that "a few nanoseconds" may be a more than generous estimate (for the record I've been seeing best-case access times on the order of 30µs). – doynax Oct 14 '15 at 20:28
As with all performance issues, the final optimizations vary widely. Measure both implementations against a reasonable data set and take it from there.
As a design choice it may be simpler to cache the contents of the file in memory once it has been read the first time and then there is no need to re-read the contents. If the modified content is required then again, cache the modified data to forgo the second read.

- 30,036
- 10
- 99
- 142