6

Consider those two python programs:

script_a.py:

from datetime import datetime
from time import sleep

while True:
    sleep(1)
    with open('foo.txt', 'w') as f:
        sleep(3)
        s = str(datetime.now())
        f.write(s)
        sleep(3)

script_b.py:

while True:
    with open('foo.txt') as f:
        s = f.read()
        print s

Run script_a.py. While it is running, start script_b.py. Both will happily run, but script_b.py outputs an empty string if the file is currently opened by script_a.py.

I was expecting an IOError exception to be raised, telling me the file is already opened, but it didn't happen, instead the file looks empty. Why is that and what would be the proper way to check if it is opened by another process? Would it be ok to simply check if an empty string is returned and try again until something else is read, or is there a more pythonic way?

soerface
  • 6,417
  • 6
  • 29
  • 50
  • 5
    Only Windows locks files when open for writing. POSIX platforms do not; this is not a Python problem. It is instead a feature of your OS. – Martijn Pieters Mar 24 '14 at 18:17
  • You'll have to use locks instead; explicitly lock the file for writing by one of the processes. – Martijn Pieters Mar 24 '14 at 18:18
  • Oh, I didn't know that this has something to do with the OS, thanks! If so, I wonder if the question still belongs on stackoverflow… – soerface Mar 24 '14 at 18:20

2 Answers2

3

You are allowed to open a file as many times as you want, so long as the operating system doesn't stop you. This is occasionally useful to get multiple cursors into a file for complex operations.

The reason that script_b.py thinks that the file is empty is that the file is empty:

with open('foo.txt', 'w') as f:

opening a file in w mode immediately erases (i.e. truncates) the file. There's an initial three second gap in script_a where the file is completely 100% empty, and that's what script_b sees.

In the next three second gap after you call f.write, the file is still... probably empty. This is due to buffering - the file on disk is not guaranteed to contain everything that you have written to it with write until you either close (i.e. exit the context manager block) or manually invoke flush on the file handle.

Alternatively, you can open in unbuffered mode, so that writes are always immediately written to disk.

with open('foo.txt','w',0) as f:
   #no buffering, f.write() writes immediately to disk
roippi
  • 25,533
  • 4
  • 48
  • 73
  • Both scripts open the file for reading, each `open()` truncates the file. Then there is buffering too; writes are not immediately visible to other readers of the file, not until the buffer has been flushed. – Martijn Pieters Mar 24 '14 at 18:25
  • Well `script_b`'s `open` doesn't truncate, since it's in `r` mode. You're right that I didn't address buffering in that second 3-second window in `script_a`. – roippi Mar 24 '14 at 18:46
3

See the other answer and comments regarding how multiple file opens work in Python. If you've read all that, and still want to lock access to the file on a POSIX platform, then you can use the fcntl library.

Keep in mind that: A) other programs may ignore your lock on the file, B) some networked file systems don't implement locking very well, or at all C) be sure to be very careful to release locks and avoid deadlock as flock won't detect it [1][2].

Example.... script_a.py

from datetime import datetime
from time import sleep
import fcntl

while True:
    sleep(1)
    with open('foo.txt', 'w') as f:
        s = str(datetime.now())

        print datetime.now(), "Waiting for lock"
        fcntl.flock(f, fcntl.LOCK_EX)
        print datetime.now(), "Lock clear, writing"

        sleep(3)
        f.write(s)

        print datetime.now(), "releasing lock"
        fcntl.flock(f, fcntl.LOCK_UN)

script_b.py

import fcntl
from datetime import datetime

while True:
    with open('foo.txt') as f:
        print datetime.now(), "Getting lock"
        fcntl.flock(f, fcntl.LOCK_EX)
        print datetime.now(), "Got lock, reading file"

        s = f.read()

        print datetime.now(), "Read file, releasing lock"
        fcntl.flock(f, fcntl.LOCK_UN)

        print s

Hope this helps!

mdadm
  • 1,333
  • 1
  • 12
  • 9