76

I am using Python to write chunks of text to files in a single operation:

open(file, 'w').write(text)

If the script is interrupted so a file write does not complete I want to have no file rather than a partially complete file. Can this be done?

martineau
  • 119,623
  • 25
  • 170
  • 301
hoju
  • 28,392
  • 37
  • 134
  • 178
  • 1
    related: [Threadsafe and fault-tolerant file writes](http://stackoverflow.com/questions/12003805/threadsafe-and-fault-tolerant-file-writes) – jfs Sep 10 '12 at 07:35

7 Answers7

118

Write data to a temporary file and when data has been successfully written, rename the file to the correct destination file e.g

with open(tmpFile, 'w') as f:
    f.write(text)
    # make sure that all data is on disk
    # see http://stackoverflow.com/questions/7433057/is-rename-without-fsync-safe
    f.flush()
    os.fsync(f.fileno())    
os.replace(tmpFile, myFile)  # os.rename pre-3.3, but os.rename won't work on Windows

According to doc http://docs.python.org/library/os.html#os.replace

Rename the file or directory src to dst. If dst is a non-empty directory, OSError will be raised. If dst exists and is a file, it will be replaced silently if the user has permission. The operation may fail if src and dst are on different filesystems. If successful, the renaming will be an atomic operation (this is a POSIX requirement).

Note:

  • It may not be atomic operation if src and dest locations are not on same filesystem

  • os.fsync step may be skipped if performance/responsiveness is more important than the data integrity in cases like power failure, system crash etc

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
Anurag Uniyal
  • 85,954
  • 40
  • 175
  • 219
  • 11
    For completeness, the [tempfile](http://docs.python.org/library/tempfile.html) module provides an easy, safe way to create temporary files. – itsadok Aug 29 '11 at 04:54
  • 10
    And for more completeness: `rename` is atomic only within same filesystem on POSIX, so the easiest way is to create `tmpFile` in the directory of `myFile`. – darkk Jan 13 '12 at 15:10
  • I found this won't work on Windows if the file already exists: "On Windows, if dst already exists, OSError will be raised" – hoju Jun 30 '12 at 08:29
  • 1
    While `os.fsync` is necessary if you're worried about the OS shutting down suddenly (such as loss of power or kernel panic) it's overkill for the case where you're just concerned about the process being interrupted. – R Samuel Klatchko Sep 01 '12 at 16:20
  • @RSamuelKlatchko yes may be but it is a single line, doesn't hurt and will save data in rare cases you mentioned. – Anurag Uniyal Sep 01 '12 at 23:41
  • 1
    @AnuragUniyal - whether it hurts or not depends on how often the atomic write is done. `os.fsync` can be very slow as it has to wait for the kernel to flush its buffers. If someone uses this code to write multiple files, it can definitely cause measurable slow downs. – R Samuel Klatchko Sep 02 '12 at 02:49
  • @RSamuelKlatchko yes I agree, I will update question with that info. – Anurag Uniyal Sep 02 '12 at 20:36
  • [fsync() might not be enough on OSX](http://downloads.sehe.nl/zfs-fuse/eat_my_data.odp). For portability [`os.replace()` or its analogs could be used on Windows](http://stackoverflow.com/a/12012813/4279). – jfs Sep 10 '12 at 07:45
  • Just to be complete.. the tmpFile should be inside the same directory as the destination file, to ensure that they are on the same filesystem – Romuald Brunet May 09 '13 at 13:01
  • 2
    @J.F.Sebastian note that sqlite add this `fsync(opendir(filename))` to ensure that rename is written to disk too. This does not affect atomicity of this modification, only relative order of this operation vs prev/next on a different file. – Dima Tisnek Mar 14 '14 at 12:09
  • @qarma: note: my comment is about `fsync()` on the **file** in the answer, not directory -- there are several issues here -- and the possible necessasity for `fsync(opendir())` only confirms that *`fsync()` in the answer is not enough*. – jfs Sep 28 '15 at 19:05
  • If `myfile` is a symlink, this would make it a normal file. Would using `os.rename(tmpfile, os.path.realpath(myfile))` hurt the atomic feature? – Jason May 18 '19 at 03:23
  • Why reinvent the wheel? There are now libraries to do that, like `safer` for instance. You can use memory or temporary files and it even works with sockets! – Eric Aug 17 '20 at 10:33
  • you left out the generation of the filename of the temporary file. What if two processes try to write to the same temporary file? – Boris Verkhovskiy Oct 22 '20 at 07:28
  • @hoju: `os.rename` doesn't work on Windows, which is why they added `os.replace` (equivalent to `os.rename` on POSIX, and works the same as POSIX `os.rename` on Windows). – ShadowRanger Sep 28 '22 at 23:07
27

A simple snippet that implements atomic writing using Python tempfile.

with open_atomic('test.txt', 'w') as f:
    f.write("huzza")

or even reading and writing to and from the same file:

with open('test.txt', 'r') as src:
    with open_atomic('test.txt', 'w') as dst:
        for line in src:
            dst.write(line)

using two simple context managers

import os
import tempfile as tmp
from contextlib import contextmanager

@contextmanager
def tempfile(suffix='', dir=None):
    """ Context for temporary file.

    Will find a free temporary filename upon entering
    and will try to delete the file on leaving, even in case of an exception.

    Parameters
    ----------
    suffix : string
        optional file suffix
    dir : string
        optional directory to save temporary file in
    """

    tf = tmp.NamedTemporaryFile(delete=False, suffix=suffix, dir=dir)
    tf.file.close()
    try:
        yield tf.name
    finally:
        try:
            os.remove(tf.name)
        except OSError as e:
            if e.errno == 2:
                pass
            else:
                raise

@contextmanager
def open_atomic(filepath, *args, **kwargs):
    """ Open temporary file object that atomically moves to destination upon
    exiting.

    Allows reading and writing to and from the same filename.

    The file will not be moved to destination in case of an exception.

    Parameters
    ----------
    filepath : string
        the file path to be opened
    fsync : bool
        whether to force write the file to disk
    *args : mixed
        Any valid arguments for :code:`open`
    **kwargs : mixed
        Any valid keyword arguments for :code:`open`
    """
    fsync = kwargs.pop('fsync', False)

    with tempfile(dir=os.path.dirname(os.path.abspath(filepath))) as tmppath:
        with open(tmppath, *args, **kwargs) as file:
            try:
                yield file
            finally:
                if fsync:
                    file.flush()
                    os.fsync(file.fileno())
        os.rename(tmppath, filepath)
Nils Werner
  • 34,832
  • 7
  • 76
  • 98
  • 1
    The temp file needs to be on the same file system as the file to be replaced. This code will not work reliably on systems with multiple file systems. The NamedTemporaryFile invocation needs a dir= paramter. – textshell Jul 16 '16 at 22:16
  • Thanks for the comment, I've recently changed this snippet to fall back to `shutil.move` in case of `os.rename` failing. This allows it to work across FS boundaries. – Nils Werner Jul 18 '16 at 10:16
  • 3
    That appears to work when running it, but shutil.move uses copy2 which is not atomic. And if copy2 wanted to be atomic it would need to create a temporary file in the same file system as the destination file. So, the fix to fall back to shutil.move maskes the problem only. That is why most snippets place the temporary file into the same directory as the target file. Which is also possible using tempfile.NamedTemporaryFile using the dir named argument. As moving over a file in a directory which is not writable doesn’t work anyway that seem to be the simplest and most robust solution. – textshell Jul 18 '16 at 17:17
  • Correct, I assumed that `shutils.move()` was non-atomic due to `shutils.copy2()` and `shutils.remove()` called in succession. The new implementation (see edit) will now instead create the file in the current directory and also handle exceptions better. – Nils Werner Jul 19 '16 at 12:23
  • How come this be atomic while reading and writing to same file? In the example above `open('test.txt', 'r') as src:` is used to read the file contents. Writing in this sense is atomic but reading might not be the same case. For file types like `.ini` playup with decorators when used with configparser for read operations. Not sure this sample completely justifies the atomicity around reading from same file over 200000 threads. This will throw Too Many Open Files error. – bh4r4th Jan 10 '20 at 05:27
  • @bh4r4th I don't understand your comment. But atomicity or not, opening 200,000 files is simply too many. – Nils Werner Jan 10 '20 at 08:08
  • Yeah, make sense too many files. I am updating a file which stores the status after every update. I have 200000 updates getting triggered. Will change my implementation. – bh4r4th Jan 13 '20 at 01:10
  • `tempfile.NamedTemporaryFile().name` always starts with `/tmp` for me. If [tmpfs](https://en.wikipedia.org/wiki/Tmpfs) is an in-memory file system, how can your code be atomic if it needs to write the file's contents from in-memory tmpfs to disk? – Boris Verkhovskiy Oct 22 '20 at 07:32
  • @BorisVerkhovskiy: You didn't pass `dir=` so it puts it in the a temp directory. You need to pass `dir=os.path.dirname(path_to_orig_file)` to put it in the same directory as the original file, and thereby allow atomic rename within the same file system. – ShadowRanger Sep 28 '22 at 23:03
19

Since it is very easy to mess up with the details, I recommend using a tiny library for that. The advantage of a library is that it takes care all these nitty-gritty details, and is being reviewed and improved by a community.

One such library is python-atomicwrites by untitaker which even has proper Windows support:

Caveat (as of 2023):

This library is curently unmaintained. Comment from the author:

[...], I thought it'd be a good time to deprecate this package. Python 3 has os.replace and os.rename which probably do well enough of a job for most usecases.

Original recommendation:

From the README:

from atomicwrites import atomic_write

with atomic_write('foo.txt', overwrite=True) as f:
    f.write('Hello world.')
    # "foo.txt" doesn't exist yet.

# Now it does.

Installation via PIP:

pip install atomicwrites
vog
  • 23,517
  • 11
  • 59
  • 75
6

I’m using this code to atomically replace/write a file:

import os
from contextlib import contextmanager

@contextmanager
def atomic_write(filepath, binary=False, fsync=False):
    """ Writeable file object that atomically updates a file (using a temporary file).

    :param filepath: the file path to be opened
    :param binary: whether to open the file in a binary mode instead of textual
    :param fsync: whether to force write the file to disk
    """

    tmppath = filepath + '~'
    while os.path.isfile(tmppath):
        tmppath += '~'
    try:
        with open(tmppath, 'wb' if binary else 'w') as file:
            yield file
            if fsync:
                file.flush()
                os.fsync(file.fileno())
        os.rename(tmppath, filepath)
    finally:
        try:
            os.remove(tmppath)
        except (IOError, OSError):
            pass

Usage:

with atomic_write('path/to/file') as f:
    f.write("allons-y!\n")

It’s based on this recipe.

Jakub Jirutka
  • 10,269
  • 4
  • 42
  • 35
  • 1
    the while loop is racy it could be that 2 concurrent processes opening the same file. tempfile.NamedTemporaryFile can overcome this. – Mic92 May 31 '16 at 15:29
  • 2
    I think tmppath like this would be better '.{filepath}~{random}' this avoids race conditions if two processes do the same. This does not solve the race condition, but at least you don't get a file with content of two processes. – guettli Oct 11 '16 at 09:53
3

Just link the file after you're done:

with tempfile.NamedTemporaryFile(mode="w") as f:
    f.write(...)
    os.link(f.name, final_filename)

If you want to get fancy:

@contextlib.contextmanager
def open_write_atomic(filename: str, **kwargs):
    kwargs['mode'] = 'w'
    with tempfile.NamedTemporaryFile(**kwargs) as f:
        yield f
        os.link(f.name, filename)
blais
  • 687
  • 7
  • 9
2

Answers on this page are quite old, there are now libraries that do this for you.

In particular safer is a library designed to help prevent programmer error from corrupting files, socket connections, or generalized streams. It's quite flexible and amongst other things it has the option to use either memory or temporary files, you can even keep the temp files in case of failure.

Their example is just what you want:

# dangerous
with open(filename, 'w') as fp:
    json.dump(data, fp)
    # If an exception is raised, the file is empty or partly written
# safer
with safer.open(filename, 'w') as fp:
    json.dump(data, fp)
    # If an exception is raised, the file is unchanged.

It's in PyPI, just install it using pip install --user safer or get the latest at https://github.com/rec/safer

Eric
  • 1,138
  • 11
  • 24
-2

Atomic solution for Windows to loop folder and rename files. Tested, atomic to automate, you can increase probability to minimize risk not to event of having same file name. You random library for letter symbols combinations use random.choice method, for digit str(random.random.range(50,999999999,2). You can vary digits range as you want.

import os import random

path = "C:\\Users\\ANTRAS\\Desktop\\NUOTRAUKA\\"

def renamefiles():
    files = os.listdir(path)
    i = 1
    for file in files:
        os.rename(os.path.join(path, file), os.path.join(path, 
                  random.choice('ABCDEFGHIJKL') + str(i) + str(random.randrange(31,9999999,2)) + '.jpg'))
        i = i+1

for x in range(30):
    renamefiles()
Thomas Fritsch
  • 9,639
  • 33
  • 37
  • 49
Mindaugas Vaitkus
  • 151
  • 1
  • 2
  • 11