Better way to collate Multiprocessing-Process results?

Question

Let us consider the following code where I calculate the factorial of 4 really large numbers, saving each output to a separate .txt file (out_mp_{idx}.txt). I use multiprocessing (4 processes) to reduce the computation time. Though this works fine, I want to output all the 4 results in one file.

One way is to open each of the generated (4) files I create (from the code below) and append to a new file, but that's not my choice (below is just a simplistic version of my code, I have too many files to handle, which defeats the purpose of time-saving via multiprocessing). Is there a better way to automate such that the results from the processes are all dumped/appended to some file? Also, in my case the returned results form each process could be several lines, so how would we avoid open-file conflict for the case when the results are appended in the output file by one process and second process returns its answer and wants to open/access the output file?

As an alternative, I tried process.immap route, but that's not as computationally efficient as the below code. Something like this SO post.

from multiprocessing import Process
import os
import time
 
tic = time.time()
 
def factorial(n, idx):  # function to calculate the factorial
 
    num = 1
    while n >= 1:
        num *= n
        n = n - 1
 
    with open(f'out_mp_{idx}.txt', 'w') as f0:  # saving output to a separate file
        f0.writelines(str(num))
 
def My_prog():
 
    jobs = []
    N = [10000, 20000, 40000, 50000]  # numbers for which factorial is desired
    n_procs = 4
     
    # executing multiple processes
    for i in range(n_procs):  
        p = Process(target=factorial, args=(N[i], i))
        jobs.append(p)
 
    for j in jobs:
        j.start()
 
    for j in jobs:
        j.join()
 
    print(f'Exec. Time:{time.time()-tic} [s]')
 
if __name__=='__main__':
    My_prog()

I have no idea why you say using `imap` or even `map` is not as as computationally as efficient as the code you show. It is *precisely* what you want to use. You want to use a pool and return all the results back to the main process who can then write the 4 results to a single file. What is more efficient than that? Your `factorial` function is only now concerned with computing the value and returning the result back. — Booboo, May 04 '21 at 01:02

score 0 · Answer 1 · answered May 03 '21 at 22:46

0

You can do this.

Create a Queue

   a) manager = Manager()
   b) data_queue = manager.Queue()
   c) put all data in this queue.

Create a thread and start it before multiprocess

a) create a function which waits on data_queue. Something like

`

def fun():
  while True:
    data = data_queue.get()
    if instance(data_queue, Sentinal):
        break
    #write to a file

` 3) Remember to send some Sentinal object after all multiprocesses are done.

You can also make this thread a daemon thread and skip sentinal part.

answered May 03 '21 at 22:46

vks

67,027
10
91
124

Hi vks, Though I appreciate your reply very much, can you please share a simple minimum working example. Being a beginner, I am afraid I don't understand your the steps you mentioned and will really appreciate if you can share a minimal working code please? – nuki May 03 '21 at 23:13
And why wouldn't you just use a multiprocessing pool with either `imap` or `map`? Just because the OP thinks it's "computationally inefficient?" All you are doing is reinventing your own multiprocessing pool. – Booboo May 04 '21 at 01:06
@Booboo its upto op to use map or ipam...but stii he can use queue and thread to write to same file without clobbering up any data – vks May 04 '21 at 19:56
@nuki you have to use data_queue.put(data) where u write the file...the rest is just creating a thread and processing data. – vks May 04 '21 at 19:56

Better way to collate Multiprocessing-Process results?

1 Answers1