6

Using cstdio, what is the safest way of overwriting a file? 'safe' in this case meaning that there's no chance the file will become incomplete or corrupted; the file will either be the completely overwritten, or it will be the old file should something have gone awry.

I imagine the best way to do this, would be to create a temporary intermediate file, then overwrite the old file once that intermediate is complete. If that actually is the best way though, there's a few other problems that'd seem possible, if albeit rare.

  • How would I know to use this other file should the program quit while overwriting?
  • How would I know to NOT use the other file should the program quit during it's creation?
  • How would I know the original file or the intermediate is in an undefined state (since it may fail in a way that remains readable but the data it contains is subtly wrong)?

I imagine there's a single good practice for this, but I haven't been able to find it. This is for saved game data; there's only one file, and the entire file is overwritten every time as well, there are no partial overwrites or appending to worry about.

Anne Quinn
  • 12,609
  • 8
  • 54
  • 101
  • Don't overwrite it. Pick another name. If that worked then rename files. – Hans Passant Aug 15 '13 at 20:12
  • @HansPassant That seems to be exactly what is suggested by the second paragraph, albeit with imprecise terminology. –  Aug 15 '13 at 20:13
  • 1
    @HansPassant Reading the documentation on rename(), it mentions the operation may fail or succeed if the new name already exists, and that which depends on the implementation. – Anne Quinn Aug 15 '13 at 20:24
  • I wrote it up in [this answer](http://stackoverflow.com/a/6468131/17034). If you are doing this in Windows then you can use ReplaceFile(). – Hans Passant Aug 15 '13 at 20:28

4 Answers4

6

As others have said, keep the existing file around, and write to a fresh file. If it's very important (that is, the user can't possibly recover the information), make sure that there is a "backup" file around as well (e.g. if your program saves abc.config, leave an abc.old.config or abc.backup [if you want guarantees that the name works everywhere, .cfg and .bak may be better choices]).

When you write the file, put some sort of endmarker in the file, so that you can be sure that the file is complete. If you want to avoid "user editing" of the file, you may also want to have a checksum of the content (sha1, md5 or similar). If the endmarker isn't there, or the checksum is wrong, then you know that the file is "bad", so don't use it and go for the backup.

  1. Write the new content to a temporary file (e.g. fstream fout("abc.tmp");)
  2. Delete the backup file (if it exists) (e.g. remove("abc.bak");)
  3. Rename the now old file to the backup name (e.g. rename("abc.cfg", "abc.bak");)
  4. Rename the new file to the old one (e.g. rename("abc.tmp", "abc.cfg");

For ALL steps (in particular writing the actual data), check for errors. You need to decide where it is OK to get errors and where it is not (remove of a file that doesn't exist is OK, for example, but if rename doesn't work you probably should stop, or you may end up with something bad).

When loading the file, check all steps, if it goes wrong, go back to the backup file.

idbrii
  • 10,975
  • 5
  • 66
  • 107
Mats Petersson
  • 126,704
  • 14
  • 140
  • 227
  • I wonder if Windows file renames are "flushed" to disk before they return. If not, then power failure after you save could "roll back" your saved file. I would assume it guarantees a "flush" but I can't find any mention of it. – Mooing Duck Aug 15 '13 at 21:35
  • I'm not 100% sure (and of course, some disks may not ACTUALLY flush to disk, even if they say they have!), but I expect the rename to be atomic in the sense that it either is complete, or not done at all. – Mats Petersson Aug 15 '13 at 21:39
  • And unless I'm missing something, either the renames are done and thus complete, or the the old file should still be there, assuming the OS doesn't do something daft in it's "rename" like "Remove old filename (flush), put new filename (flush)" in a way that the "new filename" wasn't created, but the old one removed - which would be completely bonkers. – Mats Petersson Aug 15 '13 at 21:42
  • What if it crashes after step 3? Then all data might be on the disk, but there is no abc.cfg – BeniBela Aug 02 '16 at 20:18
  • Yes, but the data is still there, not lost. And unless we're talking about the SYSTEM crashing, your app should not crash in this position if it has been tested properly. However, the original question is about "how to make sure the file itself is either complete or not used" - in other words, there's no "half the information there, half missing" in the file itself, which the above solves adequately. – Mats Petersson Aug 03 '16 at 06:04
  • Regarding OS doing something daft, in case of OS crash, rename to non-existing filename on BTRFS may cause file to be lost - https://btrfs.wiki.kernel.org/index.php/FAQ#What_are_the_crash_guarantees_of_rename.3F . Not sure if problem here though, i.e. will step 3. always correctly finish before issue with step 4. can happen? – stoper Sep 08 '20 at 16:31
  • If we skipped abc.bak, then abc.cfg and abc.tmp ensure we always have at least one valid file on disk. So is abc.bak there in case `rename()` corrupts its input file? Is that possible? – idbrii May 30 '22 at 21:25
  • I don't see why rename would mess up the content of either file - it may FAIL to rename one or the other [and as I wrote, stop if the rename fails - that probably means some other process is competing with your rename - which is bad]. Of course, saving the old file as .bak isn't necessary - you can skip that step, but that would mean if your tmp file was "bad", you don't know what was in the old file before changes - which is the point of ht e.bak file. – Mats Petersson Jun 04 '22 at 21:30
1

You should use a database management system that guarantees ACID for this. If you insist using flat files, you should write to a temp file, copy and replace the actual file when writing completes, and only delete the temp file when copy is successful. Also, call flush() on every write to the file.

segfault
  • 5,759
  • 9
  • 45
  • 66
  • Since the temporary file is for in case the program dies during saving, what would be the best way to tell if and when a failure occurred once the program is back up and looking at the files, so that it can be corrected? – Anne Quinn Aug 15 '13 at 20:30
  • You should know name of the temporary file (e.g., program.file). Your program should check for existence of the file, and if the file exists, do something. – segfault Aug 15 '13 at 20:33
1

This is a simple and more limited answer with an implementation to provide more clarity.

It's similar to Mats Petersson's answer but only uses two files because the likelihood that rename() corrupts files seems lower than the likelihood that I implement something wrong as this code gets more complex. (It should be changing an inode or Master File Table entry and not touching file content. Probably shouldn't use this if your application might be deployed on FAT file systems.)

The steps are roughly the same:

  • Write temp file in the same directory as target file. Abort if this step fails.
  • Remove target file.
  • Rename temp file to target file.

Here's an implementation that writes char* to a file. I expect you have a unicode wrapper around file functions, so be sure to use that!

// Replace unicode:: with your unicode-aware stdio functions.

#include <stdio.h>

static const char* k_TempExtension = ".tmp";

bool Storage::SaveFile_Safe(std::string& fullpath, const char* data, unsigned int len) {
    std::string tempfile = fullpath + k_TempExtension;

    bool success = false;
    FILE* f = unicode::fopen(tempfile.c_str(), "wb");
    if (f) {
        const size_t count = 1;
        size_t written = fwrite(data, len, count, f);
        if (len == 0 && written == 0) {
            // Close enough to writing everything.
            written = count;
        }

        if (fclose(f) != 0) {
            written = 0;
        }

        if (written == count) {
            const char* tempfile_c = tempfile.c_str();
            const char* fullpath_c = fullpath.c_str();
            unicode::remove(fullpath_c);
            // If we fail between these two function calls, we still have our
            // temp file that we should attempt to load.
            success = 0 == unicode::rename(tempfile_c, fullpath_c);
        }
    }
    return success;
}

const char* Storage::LoadFile_Safe(std::string& fullpath) {
    FILE* f = unicode::fopen(fullpath.c_str(), "rb");

    if (f == 0) {
        // If we failed during save, load the temp file.
        std::string tempfile = fullpath + k_TempExtension;
        f = unicode::fopen(tempfile.c_str(), "rb");
    }

    if (f == 0) {
        return false;
    }

    // ... do loading code here
}
idbrii
  • 10,975
  • 5
  • 66
  • 107
-5

You should prevent the application to be closed while saving the data into the file. What you should do is load the old file, keep it in a variable - overwrite the data of the variable (in your app), and write it over the old one. Everything should take less than 1 second, so you shouldn't worry about closing your app while saving. Also, overwrite ONLY after you check if the operation is possible and the integrity of data is correct.

BVdjV
  • 116
  • 1
  • 1
  • 11
  • you can't prevent an application from being closed. "Everything should take less than 1 second" is speculative since you don't know size of the file. – segfault Aug 15 '13 at 20:17