9

Hi am trying to write an atomic write function like so...

with tempfile.NamedTemporaryFile(mode= "w", dir= target_directory) as f: 
     #perform file writing operation  
     os.replace(f.name, target_file_name) 

I am struggling to figure out what would be the best action to do in line 3. Should I use os.replace(), os.rename() or should I create a hard link between tempfile and target file using os.link()?

Does os.link() use more memmory? What are the benefits of each and are all of them atomic?

DZtron
  • 128
  • 8
  • 1
    This is not only platform specific but also depends on the underlying filesystem. I doubt python will give any hard guarantees here. There are configurations that don't allow atomic moves at all (Windows on FAT32 for example although I doubt there's many of those statements around any more) – Voo Feb 24 '20 at 15:26

3 Answers3

5

os.rename / os.replace are both implemented using this function

The only difference is os.replace uses is_replace=1 which has no effect on posix but sets MOVEFILE_REPLACE_EXISTING flag on windows:

If a file named lpNewFileName exists, the function replaces its contents with the contents of the lpExistingFileName file, provided that security requirements regarding access control lists (ACLs) are met. For more information, see the Remarks section of this topic.

If lpNewFileName or lpExistingFileName name a directory and lpExistingFileName exists, an error is reported.

os.link isn't really suitable for this function unless you can guarantee that the target file does not exist (as os.link will error):

$ touch a b
$ link a b
link: cannot create link 'b' to 'a': File exists
$ python3 -c 'import os; os.link("a", "b")'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
FileExistsError: [Errno 17] File exists: 'a' -> 'b'
anthony sottile
  • 61,815
  • 15
  • 148
  • 207
0

I"m not sure what you mean by 'atomic' in this case. But the main differences are:

replace and rename are similar, but replace is more cross-platform (I would assume rename is, therefore, better if you know which system you'll be using).

os.link will use the same amount of disk space, but you will create a hard link, this might not be what you want.

Daniel Marchand
  • 584
  • 8
  • 26
  • Atomic means that there move is.. well atomic. Meaning other applications either see the whole file or nothing instead of possibly a partial file. – Voo Feb 24 '20 at 15:23
0

First, os.link creates hard link which means that the src-link shares the same space with the dst-link. No copy-write operations are performed - thus no memory overhead. Sadly, hard links are not compatible with all file systems. For instance, NTFS supports hard links, while FAT and ReFS do not.

Secondly os.replace should be preferred to os.rename as it's more crossplatform (only for Python 3.3+). Another important thing is that source and destination paths must be on the same physical disk. As for atomicity, there seems to be a very high probability (but not 100%) that os.replace is atomic in all possible cases on Unix\Windows. Related links 1, 2, 3. In any case, this is the recommended approach to avoid race conditions/TOCTOU-bags. As for me, I have never encountered or been able to reproduce a situation where a calling of os.replace ended up with src or dst data corruption. However, as long as such behavior is not a requirement in official documents, os.replace should not be considered an atomic call (especially for Windows)

Your example code is definitely not atomic by definition - at any moment any related process can break the data integrity of your temporary file; abruptly closing the execution process on non-windows systems can even leave your temp-file in the specified directory forever. To solve these problems, you may need some synchronization primitives, locks, while the logic of your code must assume the most improbable cases of interruptions or corruptions of your data.

Here is an example of a common case when some data should be extracted from an existing file or otherwise created in such a file:

import time
filename = 'data' # file with data
temp_filename = 'data.temp' # temp used to create 'data'

def get_data():
    while True:
        try:
            os.remove(temp_filename)
        except FileNotFoundError: # if no data.temp
            try: # check if data already exists:
                with open(filename, 'rt', encoding='utf8') as f:
                    return f.read() # return data here
            except FileNotFoundError:
                pass # create data
        except PermissionError: # if another process/thread is creating data.temp right now
            time.sleep(0.1) # wait for it
            continue
        else:
            pass # something went wrong and it's better to create all again

        # data creation:
        excl_access = 'xt' # raises error if file exists - used as a primitive lock
        try:
            with open(temp_filename, excl_access, encoding='utf8') as f:
                # process can be interrupted here 
                f.write('Hello ') # or here
                f.write('world!') # or here
        except FileExistsError: # another one is creating it now
            time.sleep(0.1) # wait for it
            continue
        except Exception: # something went wrong
            continue
        try:
            os.replace(temp_filename, filename) # not sure this would be atomic in 100%
        except FileNotFoundError:
            continue # try again

Here's a related question with some answers that recommend some external libs to handle atomic file creation

facehugger
  • 388
  • 1
  • 2
  • 11