3

I am writing a script that is required to perform safe-writes to any given file i.e. append a file if no other process is known to be writing into it. My understanding of the theory was that concurrent writes were prevented using write locks on the file system but it seems not to be the case in practice.

Here's how I set up my test case: I am redirecting the output of a ping command:

ping 127.0.0.1 > fileForSafeWrites.txt

On the other end, I have the following python code attempting to write to the file:

handle = open('fileForSafeWrites.txt', 'w')
handle.write("Probing for opportunity to write")
handle.close()

Running concurrently both processes gracefully complete. I see that fileForSafeWrites.txt has turned into a file with binary content, instead of a write lock issued by the first process that protects it from being written into by the Python code.

How do I force either or both of my concurrent processes not to interfere with each other? I have read people advise the ability to get a write file handle as evidence for the file being write to safe, such as in https://stackoverflow.com/a/3070749/1309045

Is this behavior specific to my Operating System and Python. I use Python2.7 in an Ubuntu 12.04 environment.

Community
  • 1
  • 1
Spade
  • 2,220
  • 1
  • 19
  • 29
  • 1
    Where are you acquiring a lock on the file? It is perfectly acceptable for two files to write to a file at the same time an UNIX-like OSs, and that is exactly what will happen unless you explicitly prevent it. – kindall Jan 14 '14 at 23:22
  • Thanks, I am not acquiring a lock. I was expecting the former process to issue a write lock against the second one. And I am misled by http://stackoverflow.com/a/3070749/1309045 . If what you say is true, what is then an answer to http://stackoverflow.com/questions/3070210/need-a-way-to-determine-if-a-file-is-done-being-written-to – Spade Jan 14 '14 at 23:28

2 Answers2

2

Use the lockfile module as shown in Locking a file in Python

Community
  • 1
  • 1
James Mills
  • 18,669
  • 3
  • 49
  • 62
  • Works if I have control over both processes. What if you assume that one of the processes is external for which the programmer has no control over and cannot be asked to issue a lock? (The ping was meant to suggest this) – Spade Jan 14 '14 at 23:35
  • As suggested in the comments of your question, it's perfectly valid in UNIX-like OS(es) to concurrently write to files. This is how you achieve file locks (*one way*) in Python. If you have on control over the other process and want ACID file writes, you'll have to figure something else out. e.g: transactions, a database, an acid file system, etc – James Mills Jan 14 '14 at 23:37
1

Inspired from a solution described for concurrency checks, I came up with the following snippet of code. It works if one is able to appropriately predict the frequency at which the file in question is written. The solution is through the use of file-modification times.

import os
import time

'''Find if a file was modified in the last x seconds given by writeFrequency.'''
def isFileBeingWrittenInto(filename, 
                       writeFrequency = 180, overheadTimePercentage = 20):

    overhead = 1+float(overheadTimePercentage)/100 # Add some buffer time
    maxWriteFrequency = writeFrequency * overhead
    modifiedTimeStart = os.stat(filename).st_mtime # Time file last modified
    time.sleep(writeFrequency)                     # wait writeFrequency # of secs
    modifiedTimeEnd = os.stat(filename).st_mtime   # File modification time again
    if 0 < (modifiedTimeEnd - modifiedTimeStart) <= maxWriteFrequency:
        return True
    else:
        return False

if not isFileBeingWrittenInto('fileForSafeWrites.txt'):
    handle = open('fileForSafeWrites.txt', 'a')
    handle.write("Text written safely when no one else is writing to the file")
    handle.close()

This does not do true concurrency checks but can be combined with a variety of other methods for practical purposes to safely write into a file without having to worry about garbled text. Hope it helps the next person searching for a way to do this.

EDIT UPDATE:

Upon further testing, I encountered a high-frequency write process that required the conditional logic to be modified from

if 0 < (modifiedTimeEnd - modifiedTimeStart) < maxWriteFrequency 

to

if 0 < (modifiedTimeEnd - modifiedTimeStart) <= maxWriteFrequency 

That makes a better answer, in theory and in practice.

Community
  • 1
  • 1
Spade
  • 2,220
  • 1
  • 19
  • 29