3

I have an update method which gets called about every 16-40ms, and inside I have this code:

this.fs.writeFile("./data.json", JSON.stringify({
    totalPlayersOnline: this.totalPlayersOnline,
    previousDay: this.previousDay,
    gamesToday: this.gamesToday
}), function (err) {
    if (err) {
        return console.log(err);
    }
});

If the server throws an error, the "data.json" file sometimes becomes empty. How do I prevent that?

coNNecTT
  • 85
  • 2
  • 7
  • 3
    **1**. write a new file with a temporary name **2**. rename the old file **3**. rename the new file to the destination name **4**. delete the old file – Denys Séguret Jun 17 '15 at 08:32
  • 1
    write it twice each time, first as `data.json.bak` then as `data.json`. one of them will not be empty. – dandavis Jun 17 '15 at 08:41
  • 2
    Wouldn't solving the reason why the server is throwing errors also be something to look at? Or at least add some proper error handling so you can exit the process cleanly. – robertklep Jun 17 '15 at 09:26
  • Yeah, I'm doing that, I always try to make my servers bug-free, but in case of an unknown error happening while the server is being hosted, I try to make sure the file doesn't crash the server completely. – coNNecTT Jun 17 '15 at 10:10
  • @DenysSéguret With `fs.rename()`, you can skip steps **2** and **4** and avoid a race/crash condition where the destination name does not exist. – binki Aug 08 '17 at 15:18

3 Answers3

6

Problem

fs.writeFile is not an atomic operation. Here is an example program which I will run strace on:

#!/usr/bin/env node
const { writeFile, } = require('fs');

// nodejs won’t exit until the Promise completes.
new Promise(function (resolve, reject) {
    writeFile('file.txt', 'content\n', function (err) {
        if (err) {
            reject(err);
        } else {
            resolve();
        }
    });
});

When I run that under strace -f and tidied up the output to show just the syscalls from the writeFile operation (which spans multiple IO threads, actually), I get:

open("file.txt", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0666) = 9
pwrite(9, "content\n", 8, 0)            = 8
close(9)                                = 0

As you can see, writeFile completes in three steps.

  1. The file is open()ed. This is an atomic operation that, with the provided flags, either creates an empty file on disk or, if the file exists, truncates it. Truncating the file is an easy way to make sure that only the content you write ends up in the file. If there is existing data in the file and the file is longer than the data you subsequently write to the file, the extra data will stay. To avoid this you truncate.
  2. The content is written. Because I wrote such a short string, this is done with a single pwrite() call, but for larger amounts of data I assume it is possible nodejs would only write a chunk at a time.
  3. The handle is closed.

My strace had each of these steps occurring on a different node IO thread. This suggests to me that fs.writeFile() might actually be implemented in terms of fs.open(), fs.write(), and fs.close(). Thus, nodejs does not treat this complex operation like it is atomic at any level—because it isn’t. Therefore, if your node process terminates, even gracefully, without waiting for the operation to complete, the operation could be at any of the steps above. In your case, you are seeing your process exit after writeFile() finishes step 1 but before it completes step 2.

Solution

The common pattern for transactionally replacing a file’s contents with a POSIX layer is to use these steps:

  1. Write the data to a differently named file, fsync() the file (See “When should you fsync?” in “Ensuring data reaches disk”), and then close() it.
  2. rename() (or, on Windows, MoveFileEx() with MOVEFILE_REPLACE_EXISTING) the differently-named file over the one you want to replace.

Using this algorithm, the destination file is either updated or not regardless of when your program terminates. And, even better, journalled (modern) filesystems will ensure that, as long as you fsync() the file in step 1 before proceeding to step 2, the two operations will occur in order. I.e., if your program performs step 1 and then step 2 but you pull the plug, when you boot up you will find the filesystem in one of the following states:

  • None of the two steps are completed. The original file is intact (or if it never existed before, it doesn’t exist). The replacement file is either nonexistent (step 1 of the writeFile() algorithm, open(), effectively never succeeded), existent but empty (step 1 of writeFile() algorithm completed), or existent with some data (step 2 of writeFile() algorithm partially completed).
  • The first step completed. The original file is intact (or if it didn’t exist before it still doesn’t exist). The replacement file exists with all of the data you want.
  • Both steps completed. At the path of the original file, you can now access your replacement data—all of it, not a blank file. The path you wrote the replacement data to in the first step no longer exists.

The code to use this pattern might look like the following:

const { writeFile, rename, } = require('fs');

function writeFileTransactional (path, content, cb) {
    // The replacement file must be in the same directory as the
    // destination because rename() does not work across device
    // boundaries.

    // This simple choice of replacement filename means that this
    // function must never be called concurrently with itself for the
    // same path value. Also, properly guarding against other
    // processes trying to use the same temporary path would make this
    // function more complicated. If that is a concern, a proper
    // temporary file strategy should be used. However, this
    // implementation ensures that any files left behind during an 
    // unclean termination will be cleaned up on a future run.
    let temporaryPath = `${path}.new`;
    writeFile(temporaryPath, content, function (err) {
        if (err) {
            return cb(err);
        }

        rename(temporaryPath, path, cb);
    });
};

This is basically the same solution you’d use for the same problem in any langage/framework.

binki
  • 7,754
  • 5
  • 64
  • 110
  • I wondered if let tFile = path + '.new' was faster. I added a random number too so then it's not but safer in some ways. There is also no mention of how renameSync handles overwriting ~ cleanly no doubt but node docs could mention it (I'm also suggesting the Sync variation for writeFileSync as well). – Master James Aug 08 '17 at 07:35
  • @MasterJames There already is [a question discussing overwrite behavior](https://stackoverflow.com/q/21219018/429091) but none of the answers point to any nodejs documentation explaining why it works. I’m not sure there is any. Basically, libuv attempts to mimic POSIX even on Windows and thus [libuv’s `rename()`](https://github.com/libuv/libuv/blob/371ca6d4b2f9dbb0a0b012a7a8e2bad26cfd402b/src/win/fs.c#L1272), which nodejs calls, implements the overwrite behavior. – binki Aug 08 '17 at 14:15
  • @MasterJames Also, if you’re avoiding string templating just for performance reasons, you’re probably missing the point of optimization. In this code, the only thing that takes any amount of time is the IO with the disk. So the only way to improve performance is to restructure the entire application itself to write to disk less often or avoid waiting for writes to finish before proceeding. I’d imagine that any good JavaScript interpreter should internally create the same bytecode/JIT results for `\`${a}b\`` and `a + 'b'`. – binki Aug 08 '17 at 14:18
  • @MasterJames Also, I know some people recommend the random number approach. But I personally find that harder because, if your program crashes, it will leave behind a new file each time. If it uses a predictable/derivable filename, then the next run of the command will automatically clean up the old one by overwriting it. If you are worrying about symlink attacks or whatever, then using a random number is not enough. You should instead be calling something like [`mkstemp`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/mkdtemp.html) or using a package which uses a safe pattern. – binki Aug 08 '17 at 15:13
  • I put it in a tmp folder instead of random now. Still sometimes the file is loaded empty so the pros and cons of losing the file and possibly watches I suspect is upto the user. I think loafing or reading the file should look again if missing or empty/trucated but it seems to be empty still sometimes when trying the rename approach? I'm thinking nodejs should pause reads when truncating before writing changes and renaming is only slightly faster in there's a smaller window to be found empty depending on file size. – Master James Aug 10 '17 at 18:49
  • Needs to work across mulitple node instances on mutile mac(hine)s. Maybe the bigger of two loads/reads is required? – Master James Aug 10 '17 at 18:55
  • Maybe `fs.writeFile()`’s lack of a call to `fsync()` makes it insufficient for such a thing. But then suddenly things get more complicated. Does [this pattern](https://gist.github.com/binki/4c0f9bf33f7ffcda273d1ce255c87bf2) work? What exact setup are you using? Remote/shared filesystem? – binki Aug 10 '17 at 19:08
  • @MasterJames See prior comment – binki Aug 10 '17 at 19:08
  • @MasterJames Oh, and I can see how handling multiple processes would require something like a random number. However, don’t forget to use O_EXCL (pass `x` to `fs.open()` and be prepared to handle a conflict condition (which would be to generate a new random number and try again and have some policy for cleaning up old temp files)). – binki Aug 10 '17 at 19:21
  • I've posted a separate solution here to help answer your questions in response to my experiences. Thanks for your continued support! – Master James Aug 11 '17 at 08:56
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/151729/discussion-between-binki-and-master-james). – binki Aug 11 '17 at 14:05
  • Hi thanks I'll just be repeating myself at this point. I didn't try all that you say. I have limited test of any issue as it was only my IDE I originally noted files be rewritten appearing blank. In my code that reads those files, it now is suppose to do a double take if it's missing or empty. This seems the least intrusive or complicated. I just use nodejs 8+ now so I have a more complicated wrapper for writeFileSync and readFileSync I'm trusting that's better then redoing it with fsync. Maybe there's an option flag I'm missing there? The opposite of this maybe O_NONBLOCK? or set an offset 0? – Master James Aug 13 '17 at 07:24
  • @binki: thanks for detailed explanation, I am wondering if this is still a valid and recommend pattern, and I am curious to know about the syscalls for rename, its still not clear for me if the file is being renamed from xyz.ext.temp to xyz.ext got terminated abruptly what is going to be the status of these two files – Sohail Faruqui Aug 25 '20 at 19:33
  • 1
    @SohailFaruqui On modern OSes/filesystems, `rename()` within the same filesystem (normally you can assume rename within a directory is within the same filesystem) is an “atomic” operation. This means that the operation either completes fully or does not complete at all—it cannot be partially completed. If you call `rename()` and the system has a power failure, two possible outcomes are allowed: there are two files where `xyz.ext.temp` will have your new data and `xyz.ext` will have the old data OR there is one file `xyz.ext` with the new data. – binki Aug 26 '20 at 04:59
  • 1
    @SohailFaruqui I recommend reading https://lwn.net/Articles/457667/ and https://lwn.net/Articles/322823/ . It is a complicated history. To be “correct” one should call `open()` on the temp file, `write()`, `fsync()`, `close()`, and then `rename()`—but a lot of people omit the `fsync()` call and a lot of OS/filesystem combinations will work correctly/safely without the call to `fsync()`. And you must recognize that, even if the `rename()` call returns, the `rename()` may appear to have never happened if you have a power failure until after an `fsync()` on the directory itself returns. – binki Aug 26 '20 at 05:08
0

if the error is caused due to bad input (the data you want to write) then make sure the data is as they should and then do the writeFile. if the error is caused due to failure of the writeFile even though the input is Ok, you could check that the function is executed until the file is written. One way is using the async doWhilst function.

async.doWhilst(
    writeFile(), //your function here but instead of err when fail callback success to loop again
    check_if_file_null, //a function that checks that the file is not null
    function (err) {
        //here the file is not null
    }
);
cs04iz1
  • 1,737
  • 1
  • 17
  • 30
0

I didn't run some real tests with this I just noticed with manually reloading my ide that sometime the file was empty. What I tried first was the rename method and noted the same problem, but recreating a new file was less desirable (considering file watches etc.).

My suggestion or what I'm doing now is in your own readFileSync I check if the file is missing or data returned is empty and sleep for a 100 milliseconds before giving it another try. I suppose a third try with more delay would really push the sigma up a notch but currently not going do it as the added delay is hopefully an unnecessary negative (would consider a promise at that point). There are other recovery option opportunities relative to your own code you can add just in case I hopefully. File not found or empty? is basically a retry another way.

My custom writeFileSync has an added flag to toggle between using the rename method (with write sub-dir '._new' creation) or the normal direct method as your code's need may vary. Possible based on file size is my recommendation.

In this use case the files are small and only updated by one node instance / server at a time. I can see adding the random file name as another option with rename to allow multiple machines to write another option for later if needed. Maybe a retry limit argument as well?

I was also thinking that you could write to a local temp and then copy to share target by some means (maybe also rename on target for speed increase), and then clean up (unlink from local temp) of course. I guess that idea is kind of pushing it to shell commands so not better. Anyway still the main idea here is to read twice if found empty. I'm sure it's safe from being partially written, via nodejs 8+ on to a shared Ubuntu type NFS mount right?

Master James
  • 1,691
  • 15
  • 19