1

In linux if I have parallel processes writing to the same file using either w/w+ mode. Is there any chance for the data that is written from the two processes to get mixed up. Or will it always contain data at a given time from only one process, because w mode truncates the existing file?

user4906231
  • 121
  • 3
  • Not quite an answer, but relevant. http://stackoverflow.com/questions/12942915/understanding-concurrent-file-writes-from-multiple-processes – Goodies Dec 06 '16 at 19:29
  • The link you provided, if I understood correctly deals with the case where it's appending not overwriting like in my example, – user4906231 Dec 06 '16 at 19:30
  • The issue here is not necessarily that multiple processes cannot write to the files, because they can. The issue would be however file writing is implemented in python or on that system. For example, `fopen` has its own internal buffer for I/O that may overwrite the others. In my case, it showed that the last stream to close got its contents in there, however you could allow them to overwrite and write all throughout the document (one letter at a time) by using `file.flush()` with `file` being an `open` object. – Goodies Dec 06 '16 at 19:56
  • I see. That's why I asked regarding python specifically in linux. I assume that in python it should implement it the same, regardless of the OS. Right? If so I'm curious what happens in that case – user4906231 Dec 06 '16 at 19:59
  • I performed that test on Debian 8.1 if that matters. – Goodies Dec 06 '16 at 20:30

3 Answers3

1

Suppose there are process A and process B will write the same file. (w/w+ mode not a/a+ append mode)

If B writes file after A edited the file, A's edit will disappear.

If A writes file after B edited the file, B's edit will disappear.

If B opens file after A edited the file, result depends on your program. There could be an error because of an unexpected edit made by A or A's edit again could disappear. But edits will not stack if you are not imitating append mode in your program.

The opposite is also true.

Actually the last writer will win.

You have to be aware of that handling w/w+ mode asynchronously is not a good idea. But this "messed up" situation can only occur on append mode not on write mode.

How do you append to a file?

Community
  • 1
  • 1
mertyildiran
  • 6,477
  • 5
  • 32
  • 55
  • That's what I wanted to make sure.I only wanted to make sure that by using w/w+ I will not get messed up data. I don't care if something get overwritten and deleted – user4906231 Dec 06 '16 at 20:05
  • @user4906231 Yeah, I'm glad to see you are satisfied with my answer. – mertyildiran Dec 06 '16 at 20:08
  • @user4906231 No, on Windows there is a difference for binary files: https://docs.python.org/2/tutorial/inputoutput.html#reading-and-writing-files – mertyildiran Dec 06 '16 at 20:15
  • I see but specifically for w/w+ modes it's the same,right? – user4906231 Dec 06 '16 at 20:16
  • @user4906231 `w` : Opens a file for writing only. Overwrites the file if the file exists. If the file does not exist, creates a new file for writing. `w+` : Opens a file for both writing and reading. Overwrites the existing file if the file exists. If the file does not exist, creates a new file for reading and writing. `wb/wb+` : same for binary. – mertyildiran Dec 06 '16 at 21:47
0

It all depends on how you open the file and what platform you open it on.

On Linux/Mac, the filesystem is inodes and file names just point to the particular inode. So if you have some code that works like this:

import time
import threading

def write_one():
    f = open('test.txt', 'w')
    f.write('something longer ')
    time.sleep(0.5)
    f.write(' something more')

def write_two():
    f = open('test.txt', 'w')
    time.sleep(0.1)
    f.write('something shorter')

if __name__ == '__main__':
    t1 = threading.Thread(target=write_one)
    t2 = threading.Thread(target=write_two)

    t1.start()
    t2.start()

    t1.join()
    t2.join()
    print('Done')

What's going to happen is that one thread opens the file for writing and then when the next thread opens the file it's going to point to a new inode and just change where the filename points to.

If you have some code like this:

import time                                 
import threading                            

f = open('test.txt', 'w')                   

def write_one():                            
    f.write('something longer ')            
    time.sleep(0.5)                         
    f.write(' something more')              

def write_two():                            
    time.sleep(0.1)                         
    f.write('something shorter')            

if __name__ == '__main__':                  
    t1 = threading.Thread(target=write_one) 
    t2 = threading.Thread(target=write_two) 

    t1.start()                              
    t2.start()                              

    t1.join()                               
    t2.join()                               
    print('Done')   

Both threads are going to have access to the same file object. You could probably do some weird things with copying that might work but I wouldn't rely on it.

If you're on Windows, however, the first method probably won't work. Windows doesn't like opening file handles with more than one process at a time. It will complain loudly, probably raising an exception.

Just don't do it. The better approach is to have one thread doing the IO, and if you want to write to the file have your other thread use a Queue or something to send data to your IO thread.

Wayne Werner
  • 49,299
  • 29
  • 200
  • 290
0

On Linux/Mac, …

What's going to happen is that one thread opens the file for writing and then when the next thread opens the file it's going to point to a new inode and just change where the filename points to.

This statement is wrong, and the program above it can easily be corrected to prove the wrongness. If we change both

    f = open('test.txt', 'w')

to

    f = open('test.txt', 'w', 0)

we won't be deceived by the output buffering (see How often does python flush to a file?) and will be able to get an output of

something shorter something more

in test.txt that clearly shows that both open() calls open the same inode.

the data won't be able to get mingled between the two threads? They will always override each other, right?

As we can see, this is wrong.

Community
  • 1
  • 1
Armali
  • 18,255
  • 14
  • 57
  • 171