0

I have been messing around with Multiprocessing on and off for months now trying to figure out an elegant repeatable solution to my issue of wanting multiple processes to write to the same file without messing each other up.

I have used the Multiprocessing Producer/Consumer relationship to overcome these hurdles in the past. Good articles and posts I've found include:

I've tried implementing a function similar to a shared counter described here:

I have become a big fan of the simplicity of the concurrent.Futures ProcessPoolExecutor and using map on each executor as described here:

Tonight, I thought I had found the answer to my search with discovering a module called fasteners for readwrite locks, but apparently this approach only works on threading.

QUESTION: IS there an elegant, simple solution to sharing a lock so that all Processes from ProcessPoolExecutor do not overwrite eachother when writing to a file?

NOTE: I'm writing about 800M rows of ~200 fields to one file using csv.DictWriter. Other recommendations are welcome.

Community
  • 1
  • 1
ccdpowell
  • 629
  • 5
  • 14
  • 22
  • I just tried using global variables and setting the lock if not set as a global variable. Still open to better solutions. Maybe joblib? – ccdpowell Feb 25 '16 at 07:43

1 Answers1

0

You are looking at the solution from the wrong angle. Instead of sharing a lock to protect the access over a file, give file access to a single process. The other processes will just tell to it what to write.

From that perspective, there are plenty of questions similar to your on stackoverflow.

Python multiprocessing safely writing to a file

Writing to a file with multiprocessing

Python Multiprocessing using Queue to write to same file

Community
  • 1
  • 1
noxdafox
  • 14,439
  • 4
  • 33
  • 45