6

When writing to an open file that I have shared via passing it to a worker function that is implemented using multiprocessing, the files contents are not written properly. Instead '^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^' is written to the file.

Why would this happen? Can you not have many multiprocessing units writing to the same file? Do you need to use a Lock? A Queue? Am I not using Multiprocessing correctly or effectively?

I feel like some example code might help, but please just refer to it as a reference of me opening a file and passing the open file via multiprocessing to another function that does writing on that file.

Multiprocessing file:

import multiprocessing as mp

class PrepWorker():
    def worker(self, open_file):
        for i in range(1,1000000):
            data = GetDataAboutI() # This function would be in a separate file
            open_file.write(data)
            open_file.flush()
        return

if __name__ == '__main__':
    open_file = open('/data/test.csv', 'w+')
    for i in range(4):
        p = mp.Process(target=PrepWorker().worker, args=(open_file,))
        jobs.append(p)
        p.start()

    for j in jobs:
        j.join()
        print '{0}.exitcode = {1}' .format(j.name, j.exitcode)   
    open_file.close()
serv-inc
  • 35,772
  • 9
  • 166
  • 188
ccdpowell
  • 629
  • 5
  • 14
  • 22
  • "There are probably details in these code examples that are not needed." [MCVE] – ivan_pozdeev Dec 29 '15 at 07:30
  • Where do the "`^@`"'s come from? I cannot see anything like this in the code. Are these literals or a representation of control symbols? – ivan_pozdeev Dec 29 '15 at 07:33
  • @ivan_pozdeev I have no idea where the ^@ values are coming from... Every line that is written while running this, is written as those repeating symbols. If I change the range to 1 and just run 1 processor, the data is written perfectly. – ccdpowell Dec 29 '15 at 07:46
  • @ccdpowell: what happens if the PrepWorkers each write a fixed character (determined at random by each worker)? – serv-inc Dec 29 '15 at 07:48
  • @ccdpowell but you can see the file in hex to answer the 2nd question. – ivan_pozdeev Dec 29 '15 at 07:53
  • @ivan_pozdeev I can view the file I am writing to using the terminal command less. That is how I see the ^@ symbols. – ccdpowell Dec 29 '15 at 07:57
  • Ha! `less` replaces nonprintable characters (to, well, let you see them). Use `xxd` to see their actual ASCII codes. – ivan_pozdeev Dec 29 '15 at 07:58
  • 1
    Based on @user's question, I ran the random string and was able to clarify the problem a little more. Each ^@ is written where there should be a character written for every process EXCEPT the last one. In my Example, if I ran this with 4 processors, each processing 10 items, I would have a string of 30 '^@' followed by 10 readable characters. – ccdpowell Dec 29 '15 at 07:59
  • @ivan_pozdeez, Didn't know that about less! ha. The ^@ symbols show up as periods and are labeled 0000 when viewed using xxd. – ccdpowell Dec 29 '15 at 08:04
  • Have you seen http://stackoverflow.com/questions/18412776/concurrent-writing-to-the-same-file-using-threads-and-processes? – serv-inc Dec 29 '15 at 08:12

1 Answers1

5

Why would this happen?

There are several processes which possibly try to call

open_file.write(data)
open_file.flush()

at the same time. Which behavior would be fitting, in your eyes, if something like

  • a.write
  • b.write
  • a.flush
  • c.write
  • b.flush

happens?

Can you not have many multiprocessing units writing to the same file? Do you need to use a Lock? A Queue?

Python multiprocessing safely writing to a file recommends having one queue, which is the read by one process which writes to the file. So do Writing to a file with multiprocessing and Processing single file from multiple processes in python.

Community
  • 1
  • 1
serv-inc
  • 35,772
  • 9
  • 166
  • 188
  • 3
    Thank you. This is what I needed. I was trying to do too much with the data the processes were overlapping writing in between flushing. This problem stemmed from a fundamental mis-understanding of how to structure multiprocessing jobs. – ccdpowell Jan 13 '16 at 16:52
  • 1
    Why does using a Lock for preventing the write/flush from different processes interweaving not work? I distribute counters using the approach described here and figured the same approach should work for an open file. http://eli.thegreenplace.net/2012/01/04/shared-counter-with-pythons-multiprocessing – ccdpowell Jan 13 '16 at 18:35
  • @ccdpowell: The code at the website you linked to seems fine. Without looking at yours, it's hard to say.How about you ask a new question with the modified code? (Feel free to ping this comment if it's not immediately answered) – serv-inc Jan 14 '16 at 15:08