25

Currently I have a loop that tries to find an unused filename by adding suffixes to a filename string. Once it fails to find a file, it uses the name that failed to open a new file wit that name. Problem is this code is used in a website and there could be multiple attempts to do the same thing at the same time, so a race condition exists.

How can I keep python from overwriting an existing file, if one is created between the time of the check and the time of the open in the other thread.

I can minimize the chance by randomizing the suffixes, but the chance is already minimized based on parts of the pathname. I want to eliminate that chance with a function that can be told, create this file ONLY if it doesn't exist.

I can use win32 functions to do this, but I want this to work cross platform because it will be hosted on linux in the end.

mkrieger1
  • 19,194
  • 5
  • 54
  • 65
boatcoder
  • 17,525
  • 18
  • 114
  • 178
  • If I had to do something like that, I'd use a predefined file name and append the current time/date to it - that way, you will be guaranteed a unique file name regardless. – Helen Neely Aug 28 '09 at 16:22
  • Date is currently in the filename, the problem is on a heavily loaded webserver, you could easily have 2 requests in the same second. – boatcoder Aug 28 '09 at 17:02
  • 4
    Use uuid.uuid1() to create files with globally unique names. – hughdbrown Aug 28 '09 at 17:44
  • I wrote a small Python package [seqfile](https://github.com/musically-ut/seqfile) to solve this problem by generating sequential filenames in a unicode-safe, thread-safe, and OS-safe manner. – musically_ut May 08 '15 at 09:13
  • Long ago ... but perhaps someone else is looking for solutions here. We had a related discussion over [here](http://stackoverflow.com/a/28532580/3693375). Perhaps check out my OS-indpendent locking-by-directory https://github.com/drandreaskrueger/lockbydir – akrueger Mar 06 '16 at 22:14

4 Answers4

39

Use os.open() with os.O_CREAT and os.O_EXCL to create the file. That will fail if the file already exists:

>>> fd = os.open("x", os.O_WRONLY | os.O_CREAT | os.O_EXCL)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OSError: [Errno 17] File exists: 'x'

Once you've created a new file, use os.fdopen() to turn the handle into a standard Python file object:

>>> fd = os.open("y", os.O_WRONLY | os.O_CREAT | os.O_EXCL)
>>> f = os.fdopen(fd, "w")  # f is now a standard Python file object

Edit: From Python 3.3, the builtin open() has an x mode that means "open for exclusive creation, failing if the file already exists".

Elrond
  • 901
  • 9
  • 23
RichieHindle
  • 272,464
  • 47
  • 358
  • 399
  • 7
    on Linux/python 2.6, `os.fdopen(..)` throws an `OSError` with `errno` 22 for the above example because the `mode` argument left to its default (`'r'`). `f = os.fdopen(fd,"w")` however works. – Andre Holzner Oct 14 '11 at 21:24
7

If you are concerned about a race condition, you can create a temporary file and then rename it.

>>> import os
>>> import tempfile
>>> f = tempfile.NamedTemporaryFile(delete=False)
>>> f.name
'c:\\users\\hughdb~1\\appdata\\local\\temp\\tmpsmdl53'
>>> f.write("Hello world")
>>> f.close()
>>> os.rename(f.name, r'C:\foo.txt')
>>> if os.path.exists(r'C:\foo.txt') :
...     print 'File exists'
...
File exists

Alternatively, you can create the files using a uuid in the name. Stackoverflow item on this.

>>> import uuid
>>> str(uuid.uuid1())
'64362370-93ef-11de-bf06-0023ae0b04b8'
Community
  • 1
  • 1
hughdbrown
  • 47,733
  • 20
  • 85
  • 108
  • 1
    I am checking to see if it exists, I'm worried about a race condition as stated above. TemporaryFile doesn't have delete as a parameter. NamedTemporaryFile does though (in v2.6), Thanks for the pointer to this part of the python library I did not know existed. The UUID thing would probably work but seems a bit exotic for what I really need. – boatcoder Aug 28 '09 at 17:28
0

If you have an id associated with each thread / process that tries to create the file, you could put that id in the suffix somewhere, thereby guaranteeing that no two processes can use the same file name.

This eliminates the race condition between the processes.

tgray
  • 8,826
  • 5
  • 36
  • 41
  • This may be a valid assumption for local (non-networked) filesystems (on UNIX like system). (Of course there are other concerns if the open() might be executed on older versions of NFS, older Linux or other OS kernels, etc: http://stackoverflow.com/questions/3406712/open-o-creat-o-excl-on-nfs-in-linux – Jim Dennis Aug 19 '13 at 23:20
0

You might as well use something like this for file name checking. You provide the name of the file and optionally the extension of the file that you want to create. If there's a file present in the cwd directory with the same name it will return the name incremented by (index), else it will return the same name.

import os

def nameIndexGenerator(name, fileExtension=''):
    if fileExtension:
        if not (os.path.exists(f'{name}.{fileExtension}')):
            return (f'{name}.{fileExtension}')
        i = 1
        while os.path.exists(f'{name}({i}).{fileExtension}'):
            i += 1
        return (f'{name}({i}).{fileExtension}')
    else:
        if not (os.path.exists(f'{name}')):
            return (f'{name}')
        i = 1
        while os.path.exists(f'{name}({i})'):
            i += 1
        return (f'{name}({i})')
Panos
  • 1
  • This ignores race conditions entirely. You would be better off creating a file with a uuid based name and then renaming it, using the failures to generate a new name. The amount of time between os.path.exists and returning the filename (and then creating the file is too great on a heavily loaded system for this to be rock solid. – boatcoder May 20 '23 at 00:11