0

Please consider the following code:

def readPath(path):
   content = None
   if os.path.isfile(path):
      f = open(path,"rb")
      content = f.read()
      f.close()
   return content

versus this one:

def readPath(path):
   content = None
   try:
      f = open(path,"rb")
      content = f.read()
      f.close()
   except:
     pass
   return content

Given that the def is called many (hundreds to thousands times) in succession, mostly with valid paths (that represent actual files on the file system) but sometimes with non existent paths, which version is more efficient? Is checking a condition prior to opening the file is slower than setting a try block?

vaultah
  • 44,105
  • 12
  • 114
  • 143
ACEGL
  • 149
  • 9

1 Answers1

6

Usually, the answer to "whichever to use, if or try", the answer is EAFP - it is easier to ask for forgiveness than permission, thus always prefer try over if. However, in this case, neither EAFP nor performance considerations shouldn't be reason to choose one over another. Instead you should choose the one that is correct.

Using isfile because it makes your code prone to race conditions. Suppose someone removes the file just after you called isfile and before you actually opened it - and you would get a spurious exception. However, there have been countless of security bugs because of code that first checks for existence, permissions, ownership or so on, of a file, and then open it - an attacker may change the file that the link is pointing to in between.

Furthermore, there are other reasons why open would fail than just the file not existing:

>>> os.path.isfile('/etc/shadow')
True
>>> open('/etc/shadow')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
PermissionError: [Errno 13] Permission denied: '/etc/shadow'

isfile also amounts to an extra system call and that in itself is more expensive than catching an exception; just doing the system call alone makes the overhead of catching an exception minuscule in comparison. As you have stated that you expect most of the file names to actually exist, the time spent in isfile is just extra overhead.

I did some timings with Python 3 on Ubuntu 16.04, and os.path.isfile for a non-existent file took /etc/foo ~2 µs (it is actually faster for an existing file /etc/passwd, 1.8 µs); trying to open a non-existent file and failing and catching the IOError took ~3 µs - so checking for the existence of a file using os.path.isfile is faster than checking for its existence using open; but that's not what your code needs to do: it needs to read the contents of a file that can be read, and return its contents, and for that, if 66 % of the files are expected to exist, then open without checking is absolutely faster than using isfile to check first, so this should be a no-brainer.


P.S: your code possibly leaks an open file on other Pythons than CPython, and is unnecessarily complicated. Using the with block, this becomes much cleaner:

def read_path(path):
    try:
        with open(path, "rb") as f:
           return f.read()
    except IOError:
        return None
  • 1
    Thanks. A race condition is not possible in my setup, however it is a very good general point. Just to make it clearer I guess you mean that performance considerations shouldn't be the reason for NOT choosing the os.path.isfile approach. – ACEGL Sep 24 '16 at 22:37
  • @ACEGL updated; not even preferring performance over correctness, allowing occasional crash etc, would make using `isfile` preferable over `try`-`open` in your scenario. – Antti Haapala -- Слава Україні Sep 25 '16 at 07:34