Using multiprocessing doesn't speed-up code

Question

I was trying to speed up the following code, using multiprocessing . So I used this code, after getting some help here :

from multiprocessing import Pool, Manager, cpu_count
from functools import partial
import re


def process_value(data_set_N, i_1):
    print(i_1)
    for i_2 in list(range(10))+list("+-*/")+["=="]:
        for i_3 in list(range(10))+list("+-*/")+["=="]:
            for i_4 in list(range(10))+list("+-*/")+["=="]:
                for i_5 in list(range(10))+list("+-*/")+["=="]:
                    for i_6 in list(range(10))+list("+-*/")+["=="]:
                        for i_7 in list(range(10))+list("+-*/")+["=="]:
                            for i_8 in list(range(10)):
                                try:
                                    value = str(i_1)+str(i_2)+str(i_3)+str(i_4)+str(i_5)+str(i_6)+str(i_7)+str(i_8)
                                    if '//' in value:
                                        continue
                                    valuev = re.sub(r'\b0+(?!\b)', '', value)
                                    evaluation = eval(valuev)
                                    if type(evaluation) == type(True) and evaluation:
                                        data_set_N.append(value)
                                except:
                                    continue

if __name__ == '__main__':
    with Manager() as manager:
        data_set_N = manager.list()
        # The iterable is the i_1 list:
        i_1_list = list(range(10))+list("+-")
        POOL_SIZE = min(cpu_count(), len(i_1_list))
        pool = Pool(POOL_SIZE)
        pool.map(partial(process_value, data_set_N), i_1_list)
        pool.close()
        pool.join()
        data_set_N=list(data_set_N)
        print(len(data_set_N))

Which is working, but it is supposed to take less time (divide the time by the number of cpus). After waiting for around 24 hours, the code was still executing.

So, I tested the code to compare it (before multiprocessing and after), with less nested loops (6 instead of 8).

from multiprocessing import Pool, Manager, cpu_count
from functools import partial
import re



def process_value(data_set_N, i_1):
    for i_2 in list(range(10))+list("+-*/")+["=="]:
        for i_3 in list(range(10))+list("+-*/")+["=="]:
            for i_4 in list(range(10))+list("+-*/")+["=="]:
                for i_5 in list(range(10))+list("+-*/")+["=="]:
                    for i_6 in list(range(10)):
                        try:
                            value = str(i_1)+str(i_2)+str(i_3)+str(i_4)+str(i_5)+str(i_6)
                            if '//' in value:
                                continue
                            valuev = re.sub(r'\b0+(?!\b)', '', value)
                            evaluation = eval(valuev)
                            if type(evaluation) == type(True) and evaluation:
                                data_set_N.append(value)
                        except:
                            continue

import time
start_time = time.time()
if __name__ == '__main__':
    with Manager() as manager:
        data_set_N = manager.list()
        # The iterable is the i_1 list:
        i_1_list = list(range(10))+list("+-")
        POOL_SIZE = min(cpu_count(), len(i_1_list))
        pool = Pool(POOL_SIZE)
        pool.map(partial(process_value, data_set_N), i_1_list)
        pool.close()
        pool.join()
        data_set_N=list(data_set_N)
        print(len(data_set_N))
  
    print("--- %s seconds ---" % (time.time() - start_time))

This one gives 4687 --- 46.11929202079773 seconds ---

While without multiprocessing :

import time
start_time = time.time()

data_set_N =[]
for i_1 in list(range(10))+list("+-"):
    for i_2 in list(range(10))+list("+-*/")+["=="]:
        for i_3 in list(range(10))+list("+-*/")+["=="]:
            for i_4 in list(range(10))+list("+-*/")+["=="]:
                for i_5 in list(range(10))+list("+-*/")+["=="]:
                    for i_6 in list(range(10)):
                        try:
                            value = str(i_1)+str(i_2)+str(i_3)+str(i_4)+str(i_5)+str(i_6)
                            if '//' in value:
                                continue
                            valuev = re.sub(r'\b0+(?!\b)', '', value)
                            evaluation = eval(valuev)
                            if type(evaluation) == type(True) and evaluation:
                                data_set_N.append(value)
                        except:
                            continue
data_set_N=list(data_set_N)
print(len(data_set_N))



print("--- %s seconds ---" % (time.time() - start_time))

Which gives 4687 --- 48.32949733734131 seconds ---. Which is almost the same.

I have 12 cpus cpu_count() >>> 12, still there is almost no difference. I dont know why using multiprocesses doesn't make it run faster.

I cannot reproduce the issue. I get 13.25... vs 43.12... seconds timing on a 4 CPU machine. — MisterMiyagi, Feb 08 '22 at 16:39
That said, even for the reduced example you keep only 4687 out of 6075000 samples – that's less than every thousandths. For the full case, this will be even worse. You will likely get much better speedup by optimising this algorithmically than by brute-forcing it. — MisterMiyagi, Feb 08 '22 at 16:44
cannot reproduce: multithreaded version is more than 3 times faster (on a dual core with hyperthreading). What's your IDE, OS, and python version? — Aaron, Feb 08 '22 at 17:10
I now used visual studio code and got 7.9 s vs 40.1 s . For the first execution (where there was no difference) I was using Jupyter , Python 3.8.8. For the last execution I used visual studio code Python 3.9.2 64-bit. And for both Windows 10. I think Jupyter notebooks have a problem with multiprocessing. (Might try with Colab later) — Anass, Feb 08 '22 at 17:26
Colab runs on a remote server not your own computer fyi, so speed comparison would be moot. Jupyter could be mis-configured idk, multiprocessing is absolutely possible with notebooks, but is somewhat quirky — Aaron, Feb 08 '22 at 17:32
@MisterMiyagi thanks for your suggestion. I thought about using the symmetry of operations (and get the result with "==" going from zero to half the list), which would divide the number of samples by 2. But when I saw the result it's 4687 ,not even divisible by 2, so the idea of symmetry wouldn't work. — Anass, Feb 08 '22 at 17:35
@Aaron thanks, I guess you are right (about Colab). I tried to see if someone else got a similar problem with notebook , I found these examples [not finishing the process](https://stackoverflow.com/questions/47313732/jupyter-notebook-never-finishes-processing-using-multiprocessing-python-3) and [AttributeError](https://stackoverflow.com/questions/41385708/multiprocessing-example-giving-attributeerror). I didn't find the same problem though. I will look further to see if it's a configuration problem. But for now I'll settle for the result given by visual studio code. — Anass, Feb 08 '22 at 17:43
"I thought about using the symmetry" It's not about *literal* symmetry, but logical one. For example, ``1-10==-9`` is symmetric to ``-9==1-10`` but not to ``9-==01-1``. Likewise, there is much bogus data from nonsensical literal sequences, such as ``+****4``. In short, it would be more efficient to create *logical* expression *trees* instead of *literal* expression *codes*. But that's an entirely different question... — MisterMiyagi, Feb 08 '22 at 17:52

score 0 · Answer 1 · answered Feb 08 '22 at 16:59

In every loop you access the managed memory:

data_set_N.append(value)

Every time a process comes to this line it has to wait for the Manager to update underlying memory for all the processes. Given that the rest of the loop is rather trivial (or fast), most of the time is spent in waiting for the Manager to update it. Note that the manager has to provide the "correct" content for all the processes, so you cannot have two processes adding a value to the array at the same time. So this part of the code essentially runs in a single process. Given that the rest of the code is pretty quick, and add some non-trivial management that Manager has to do in each append and you arrive at the same speed as without using multiprocessing at all.

Nevertheless, if you don't need anything special, you can use much quicker methods, just by... not using the manager:

def process_value(data_set_N, i_1):  # <- instead of this 
def process_value(i_1):  # <- use this
...
                                data_set_N.append(value)   # <- instead of this
                                return value  # <- use this
...
        pool.map(partial(process_value, data_set_N), i_1_list)  # <- instead of this
        data_set_N = pool.map(process_value, i_1_list)  # <- use this

The synchronised action is taken in less than one per 1000 iterations. The still costly regex-replacement and eval, plus all the other tiny but many actions around them, are thus weighted by factor >1000 and should be significant for the runtime. — MisterMiyagi, Feb 08 '22 at 17:05
Each process finds more than one result, so just ``return``'ing on the first result is not going to produce the correct outcome. Appending to a *local* queue and returning it should be correct. — MisterMiyagi, Feb 08 '22 at 17:07

Using multiprocessing doesn't speed-up code

1 Answers1