0

I wrote this script in python in order to modify a worksheet shared between processes:

import multiprocessing as mp
import openpyxl as op

def write(i):
   i+=1
   workbook = op.load_workbook('prova.xlsx')
   worksheet = workbook['Sheet']
   worksheet.cell(row=i, column =1, value = i*i)
   workbook.save('prova.xlsx')
   return i


if __name__ == '__main__':

   workbook = op.Workbook()
   workbook.save('prova.xlsx')
   num_cpu = mp.cpu_count()
   pool = mp.Pool(processes=num_cpu)

   for i in range(10):
       result = pool.apply_async(write,args=[i]).get()

   pool.close()
   pool.join()

Is this an efficient method or there is a better way to do it?

turuloru
  • 29
  • 3

1 Answers1

1

In your loop spawning the processes is self defeating unfortunately, since apply_async returns an AsyncResult whose get() method unfortunately blocks until a result is retrievable, so your code is in effect working no differently than if you were nto using multiprocessing at all. A simple workaround is to store the AsyncResult in collection and iterate over those after the initial loop:

results = list()

for i in range(10):
   results.append(pool.apply_async(write,args=[i]))

for i in results:
    result = i.get()

pool.close()
pool.join()

Also the openxl workbook documentation makes no mention of Workbooks being thread safe, meaning each thread can make changes to the same data which may be overwritten by another thread. It looks like you are being "safe enough" since you are passing a unique index to each new process; however, if you intend to increase the complexity of you program to do more, it would be in you interest to leverage an Event to avoid any race conditions:

def write(i, event):
   event.wait()
   event.clear()

   # workbook operations

   event.set()
joshmeranda
  • 3,001
  • 2
  • 10
  • 24
  • Thanks for answer. Sincerely, I don't need any result from child processes, but I use pool.apply_async().get() only because if I only use pool.apply.async() only some values are written. Since you told me it is not executing asyncronsly, maybe its means that Workbooks are not thread safe. So I should remove .get() and use Event like you advice. Am I right? – turuloru Mar 10 '21 at 22:59
  • I'd guess not, the pool may not have started the running function yet since as far as the interpreter knows, you do not need it to have run yet. So you will probably still need to cal `get` or probably `wait` since you do not need the return value. So replace the second loop in my example above with `i.wait()` and you should be all set to go – joshmeranda Mar 10 '21 at 23:55
  • I replaced second loop with `i.wait()` but I still have the same problem. If I use `results.append(pool.apply_async(write,args=[i]))`, it still write only 2 or 3 random values into Workbook and when I open excell file it says to me that it is corrupted. I also tried with event but the result is the same. – turuloru Mar 11 '21 at 00:25
  • huh, weird. Looking at the docs it doesn't say that the two should function any different, but if `get()` works where `wait()` doesn't use it instead. You'll incur a small performance hit. – joshmeranda Mar 11 '21 at 01:08
  • As a sanity check you could call [`ready()`](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.AsyncResult.ready) after `wait()` just to verify that the method call has completed – joshmeranda Mar 11 '21 at 01:09
  • 1
    I resolved using `multiprocessing.Lock()` and his methods `acquire()` and `release()` – turuloru Mar 11 '21 at 14:48
  • Sounds good, Lock is the better approach, gives one thread control rather than potentially many. Shouldv'e been my suggestion, glad you figured things out – joshmeranda Mar 11 '21 at 15:00