Why is python multiprocessing slower than running multiple instances of the same program

Question

Apologies for the poor title, I was having trouble figuring out how to word it.

I've written a python program that reads in a file of words, stores them in a list, then iterates over the list and performs a check on each item.

I'm now trying to speed this up as the list is quite large.

I'm trying to use the python multiprocessing module to achieve this. I have included an example of the code below but I have expanded out some loops to make more clear what is going on. Essentially what I am trying to do is split out the list into 10 parts, then send each part to a separate process. The program works and returns the expected result but the checking part of it takes around ~22 seconds to run.

import time
import pickle
import multiprocessing as mp
import check

def check_v3(read_list, query, return_list):
    if isinstance(query, str):
        query = list(query)
        query_len = len(query)
        if query_len == 1:
            results = check.check_v3_1(read_list, query)
        if query_len == 2:
            results = check.check_v3_2(read_list, query)
        if query_len == 3:
            results = check.check_v3_3(read_list, query)        
        elif query_len == 4:
            results = check.check_v3_4(read_list, query)
        elif query_len == 5:
            results = check.check_v3_5(read_list, query)
        elif query_len == 6:
            results = check.check_v3_6(read_list, query)
        elif query_len == 7:
            results = check.check_v3_7(read_list, query)
        return_list.append(results)

def read_pickle(file_name):
    with open(file_name, "rb") as fin:
        read_list = pickle.load(fin)
    return read_list

if __name__ == "__main__":
    read_list = read_pickle("pickled_list")

    split_list_1 = read_list[:(round(len(read_list)/10))]
    split_list_2 = read_list[(round(len(read_list)/10)*1):(round(len(read_list)/10)*2)]
    split_list_3 = read_list[(round(len(read_list)/10)*2):(round(len(read_list)/10)*3)]
    split_list_4 = read_list[(round(len(read_list)/10)*3):(round(len(read_list)/10)*4)]
    split_list_5 = read_list[(round(len(read_list)/10)*4):(round(len(read_list)/10)*5)]
    split_list_6 = read_list[(round(len(read_list)/10)*5):(round(len(read_list)/10)*6)]
    split_list_7 = read_list[(round(len(read_list)/10)*6):(round(len(read_list)/10)*7)]
    split_list_8 = read_list[(round(len(read_list)/10)*7):(round(len(read_list)/10)*8)]
    split_list_9 = read_list[(round(len(read_list)/10)*8):(round(len(read_list)/10)*9)]
    split_list_10 = read_list[(round(len(read_list)/10)*9):]

    query = "check"
    
    manager = mp.Manager()
    return_list = manager.list()

    p1 = mp.Process(target=check_v3, args=(split_list_1, query, return_list))
    p2 = mp.Process(target=check_v3, args=(split_list_2, query, return_list))
    p3 = mp.Process(target=check_v3, args=(split_list_3, query, return_list))
    p4 = mp.Process(target=check_v3, args=(split_list_4, query, return_list))
    p5 = mp.Process(target=check_v3, args=(split_list_5, query, return_list))
    p6 = mp.Process(target=check_v3, args=(split_list_6, query, return_list))
    p7 = mp.Process(target=check_v3, args=(split_list_7, query, return_list))
    p8 = mp.Process(target=check_v3, args=(split_list_8, query, return_list))
    p9 = mp.Process(target=check_v3, args=(split_list_9, query, return_list))
    p10 = mp.Process(target=check_v3, args=(split_list_10, query, return_list))
    
    start_time = time.time()
    
    p1.start()
    p2.start()
    p3.start()
    p4.start()
    p5.start()
    p6.start()
    p7.start()
    p8.start()
    p9.start()
    p10.start()

    p1.join()
    p2.join()
    p3.join()
    p4.join()
    p5.join()
    p6.join()
    p7.join()
    p8.join()
    p9.join()
    p10.join()

    print("--- %s seconds ---" % (time.time() - start_time))

    print(return_list)

I thought this was taking longer than expected so I tried something else to see if it would still take as long (see below code). I essentially copy and pasted the python code 4 times but statically defined in each program that they would only be running the checks on 1/4 of the same list that was given to the original program (each program would get a different quarter). They would then output a pickled version of the list before finally another script would run which would compile together the 4 separate pickled lists returned by the programs. When I run this bash script, the checking part of each program takes under 2 seconds to run.

#!/bin/bash
python3 check_1_4.py & 
python3 check_2_4.py &
python3 check_3_4.py & 
python3 check_4_4.py &
wait
python3 -i read_4split.py

I'm not too sure why there is such a big difference between the python script and the bash script just telling multiple python scripts to run. I'm sure there is something obvious I am missing here but I just can't seem to find what it is.

Does this help at all? https://stackoverflow.com/questions/20727375/multiprocessing-pool-slower-than-just-using-ordinary-functions — JonSG, Mar 23 '23 at 13:41
Hey @JonSG, thanks for your response, unfortunately it doesn't help. The solution found for that issue was to manually split the list that they were using but I've already done that when defining the many split_list_X that I have. — Tgumtree, Mar 23 '23 at 13:56
Please post the code of the `check` module and a link to the file you are using. — Louis Lac, Mar 23 '23 at 14:35
Unfortunately I can't post the code to the check module as it is company code, the file is a pickled list of lists as such `[["hello", "goodbye"],["apple", "banana"], ... ["cat", "dog"]]`. The check module takes in that list of lists, iterates over it and performs some checks. The results are then appended to the list `return_list` that is passed into each process. — Tgumtree, Mar 23 '23 at 14:39
I want to make sure I understand your question: In your first approach you start 10 processes and each process does one `append` operation on the passed managed list. Your second approach essentially runs this program 4 times but the *read_list* argument to `check_v3` is one fourth the original size and instead of appending the results to a managed list, it is outputting the results using `pickle` and then a fifth program reads in the 4 output files and creates a single list from that output. The second approach takes only 2 seconds instead of 22 seconds. Is that correct? — Booboo, Mar 24 '23 at 21:24

Why is python multiprocessing slower than running multiple instances of the same program

0 Answers0