4

I have an embarrassingly parallel problem, the function to parallelize share no memory state, but need to add a lines to a csv file. Lines can be added in the file in any order, the full stuff can take long, so we need to be able to read progression for the csv file.

Is that safe/better to use a Pool with a global Lock as initializer than using (like described in [1]) a Queue as input fed by the other worker processes and a single process writing in the csv file ?

[1] Python multiprocessing safely writing to a file

::

from random import random
from time import sleep, time
from multiprocessing import Pool, Lock
import os

def add_to_csv(line,  fd='/tmp/a.csv'):
    pid = os.getpid()
    with lock:
        with open(fd, 'a') as csvfile:
            sleep(1)
            csvfile.write(line)
    print '    line added by {}'.format(pid)

def f(x):
    start = time()
    pid = os.getpid()
    print '=> pi: {} started'.format(pid)
    sleep(6*random())
    res = 2*x
    print 'pi: {} res {} in {:2.2}s'.format(pid, res, time() - start)
    add_to_csv(str(res) + '\n')
    return res

def init(l):
    global lock
    lock = l

if __name__ == '__main__':
    sleep(2)
    lock = Lock()
    pool = Pool(initializer=init, initargs=(lock,))
    out = pool.map(f, [1, 2, 3, 4])
    print out

Execution get this::

=> pi: 521 started
=> pi: 522 started
=> pi: 523 started
=> pi: 524 started
pi: 521 res 2 in 1.3s
    line added by 521
pi: 523 res 6 in 3.4s
    line added by 523
pi: 524 res 8 in 5.2s
pi: 522 res 4 in 5.4s
    line added by 524
    line added by 522
[2, 4, 6, 8]
Community
  • 1
  • 1
user3313834
  • 7,327
  • 12
  • 56
  • 99
  • 1
    Another approach: have each process write to its own `.csv` file. When the program is done, paste those files together into one `.csv` file. Then each worker process can run with _no_ sync overheads for the duration. – Tim Peters Oct 01 '16 at 18:05
  • 1
    If you use a lock, you also have to flush the file on every write which is expensive. I'd stick with a queue to a file writer. – tdelaney Oct 01 '16 at 19:27
  • Hi, `Tim Peters` I've update the spec to exlain better the usecase – user3313834 Oct 02 '16 at 15:20

0 Answers0