17

I have a urllib2 caching module, which sporadically crashes because of the following code:

if not os.path.exists(self.cache_location):
    os.mkdir(self.cache_location)

The problem is, by the time the second line is being executed, the folder may exist, and will error:

  File ".../cache.py", line 103, in __init__
    os.mkdir(self.cache_location)
OSError: [Errno 17] File exists: '/tmp/examplecachedir/'

This is because the script is simultaneously launched numerous times, by third-party code I have no control over.

The code (before I attempted to fix the bug) can be found here, on github

I can't use the tempfile.mkstemp, as it solves the race condition by using a randomly named directory (tempfile.py source here), which would defeat the purpose of the cache.

I don't want to simply discard the error, as the same error Errno 17 error is raised if the folder name exists as a file (a different error), for example:

$ touch blah
$ python
>>> import os
>>> os.mkdir("blah")
Traceback (most recent call last):
  File "", line 1, in 
OSError: [Errno 17] File exists: 'blah'
>>>

I cannot using threading.RLock as the code is called from multiple processes.

So, I tried writing a simple file-based lock (that version can be found here), but this has a problem: it creates the lockfile one level up, so /tmp/example.lock for /tmp/example/, which breaks if you use /tmp/ as a cache dir (as it tries to make /tmp.lock)..

In short, I need to cache urllib2 responses to disc. To do this, I need to access a known directory (creating it, if required), in a multiprocess safe way. It needs to work on OS X, Linux and Windows.

Thoughts? The only alternative solution I can think of is to rewrite the cache module using SQLite3 storage, rather than files.

Esteban Küber
  • 36,388
  • 15
  • 79
  • 97
dbr
  • 165,801
  • 69
  • 278
  • 343

5 Answers5

11

Instead of

if not os.path.exists(self.cache_location):
    os.mkdir(self.cache_location)

you could do

try:
    os.makedirs(self.cache_location)
except OSError:
    pass

As you would end up with the same functionality.

DISCLAIMER: I don't know how Pythonic this might be.


Using SQLite3, might be a bit of overkill, but would add a lot of functionality and flexibility to your use case.

If you have to do a lot of "selecting", concurrent inserting and filtering, it's a great idea to use SQLite3, as it wont add too much complexity over simple files (it could be argued that it removes complexity).


Rereading your question (and comments) I can better understand your problem.

What is the possibility that a file could create the same race condition?

If it is small enough, then I'd do something like:

if not os.path.isfile(self.cache_location):
    try:
        os.makedirs(self.cache_location)
    except OSError:
        pass

Also, reading your code, I'd change

else:
    # Our target dir is already a file, or different error,
    # relay the error!
    raise OSError(e)

to

else:
    # Our target dir is already a file, or different error,
    # relay the error!
    raise

as it's really what you want, Python to reraise the exact same exception (just nitpicking).


One more thing, may be this could be of use for you (Unix-like only).

Community
  • 1
  • 1
Esteban Küber
  • 36,388
  • 15
  • 79
  • 97
  • 1
    Agreed, though you might consider using `os.makedirs` instead, which create parent directories as well, if necessary. – Eli Courtwright Oct 19 '09 at 02:03
  • For many cases, ignoring the error would be Pythonic, but in this case it could cause other problems, as I addressed in the question - "I don't want to simply discard the error, as the same error Errno 17 error is raised if the folder name exists as a file (a different error)" - discarding the exception would mean the temp directory either doesn't exist (say, a permission denied error), or is a useless file (and would cause errors further down the path) – dbr Oct 19 '09 at 02:09
  • 1
    Also, I agree with your points about SQLite - I wouldn't have to do much in the way of filtering (pretty much just "has this URL been requested in the last x hours"), but SQLite would remove a lot of the complexity (no constructing cached-file paths and checking file modification times), and I think it should deal with all the locking issues (both for specific bits of cached data, and the sqlite file itself).. Hm... If Pythonic Metaphor's solution doesn't work, I may start this (in fact I may regardless, a nice SQLite backed urllib2 cache module would be useful) – dbr Oct 19 '09 at 02:15
  • 2
    @dbr: If you want to distinguish between "directory already exists as directory" and "directory name already exists as file", you do that in the except clause. Don't make this too complex. Simply handle the exception. – S.Lott Oct 19 '09 at 10:35
  • Thanks for the ideas. Catching OSError, checking if the dir exists and if so ignoring the error seems to work fine, and will still correctly error if the user has insufficient permissions. I've made the change you suggested to the code, I completely forgot `raise` will re-raise the caught exception. – dbr Oct 19 '09 at 12:59
10

The code I ended up with was:

import os
import errno

folder_location = "/tmp/example_dir"

try:
    os.mkdir(folder_location)
except OSError as e:
    if e.errno == errno.EEXIST and os.path.isdir(folder_location):
        # File exists, and it's a directory,
        # another process beat us to creating this dir, that's OK.
        pass
    else:
        # Our target dir exists as a file, or different error,
        # reraise the error!
        raise
maazza
  • 7,016
  • 15
  • 63
  • 96
dbr
  • 165,801
  • 69
  • 278
  • 343
3

In Python 3.x, you can use os.makedirs(path, exist_ok=True), which will not raise any exception if such directory exists. It will raise FileExistsError: [Errno 17] if a file exists with the same name as the requested directory (path).

Verify it with:

import os

parent = os.path.dirname(__file__)

target = os.path.join(parent, 'target')

os.makedirs(target, exist_ok=True)
os.makedirs(target, exist_ok=True)

os.rmdir(target)

with open(target, 'w'):
    pass

os.makedirs(target, exist_ok=True)
ikamen
  • 3,175
  • 1
  • 25
  • 47
2

Could you catch the exception and then test whether the file exists as a directory or not?

pythonic metaphor
  • 10,296
  • 18
  • 68
  • 110
  • Possibly! I thought of that just as I was re-reading the question, before submitting it.. I've implemented this ( http://github.com/dbr/tvdb_api/blob/468d9f816373b14ef3a483fca07e031b69fa62f9/cache.py#L103-114 ), and will get the person who reported the bug to test it shortly. – dbr Oct 19 '09 at 02:04
  • 1
    @dbr: note that on line 114, you want `raise e` since it's already an instance of `OSError` . http://github.com/dbr/tvdb_api/blob/468d9f816373b14ef3a483fca07e031b69fa62f9/cache.py#L114 – nosklo Oct 19 '09 at 15:33
1

When you have race conditions often EAFP(easier to ask forgiveness than permission) works better that LBYL(look before you leap)

Error checking strategies

John La Rooy
  • 295,403
  • 53
  • 369
  • 502
  • This all depends on likelihood of conflicts - EAFP = optimistic concurrency, which checks for conflict after trying the operation, and works well if there is small probability of conflicts. LBYL = pessimistic concurrency which checks before operation and is better if there are many conflicts typically. For a simple case like this I think pessimistic is better. – RichVel Jan 26 '16 at 16:00
  • @RichVel, with LBYL here you have the possibility of a race condition. So you need extra code to handle the exception anyway. So it's only a simple operation and the extra code isn't much now, but these things have a way of growing over the life of the project. There is an extra cost every time someone needs to read/understand/debug this extra code. I think it's likely to be a premature and unnecessary optimisation in this case. – John La Rooy Jan 27 '16 at 00:34
  • That's not the case, because mkdir is atomic i.e. it is equivalent to a single 'test-and-set' operation. Coding LBYL operations with non-atomic test and set operations will cause race conditions, but that's an implementation error. – RichVel Jan 27 '16 at 17:22