0

I wrote a function which contains about 400 lines. The function does some kind of data science on a dataframe. When I run the function it took about 10 seconds. I need to run this function 100 times with different arguments in each iteration.Therefore inside a loop I call that function 100 times and for each iteration I put 4 different arguments. It took about 15 minutes in total. Therefore I want to use CPU Parallelization. How can I use multiprocessing in python to provide parallelization and improve runtime?

Code example:

result = []
for i range(100):
    result.append(searching_algorithm(a[i], b[i], c[i], d[i]))   
Booboo
  • 38,656
  • 3
  • 37
  • 60
Ali
  • 99
  • 2
  • 9
  • So what's your problem? Have you tried to search something. Maybe official docs of [`multiprocessing`](https://docs.python.org/3/library/multiprocessing.html) and [`concurrent.futures`](https://docs.python.org/3/library/concurrent.futures.html#module-concurrent.futures) modules could be a good starting point. – Olvin Roght Aug 04 '21 at 19:15
  • @Olvin Roght, thanks for the comment, I see those documentations but can't get right way to do multirocessing. I don't know how exactly I should call the multiprocessing function and in which way should I put arguments – Ali Aug 04 '21 at 19:25
  • 1
    You see but not read. Each of links I've attached to previous comment contains *Examples* section ([1](https://docs.python.org/3/library/multiprocessing.html#examples), [2](https://docs.python.org/3/library/concurrent.futures.html#processpoolexecutor-example)). – Olvin Roght Aug 04 '21 at 19:54

1 Answers1

2

You did not say what type of lists a, b, c and d are. The elements in these lists must be able to be serializable using the pickle module because they need to be passed to a function that will be executed by a process running in a different address space. For the sake of argument let's assume they are lists of integers of at least length 100.

You also did not state what platform you are running under (Windows? MacOS? Linux?). When you tag a question with multiprocessing you are supposed to also tag the question with the platform. How you organize your code is somewhat dependent on the platform. In the code below, I have chosen the most efficient arrangement for those platforms that use spawn to create new processes, namely Windows. But this will also be efficient on MacOS and Linux, which by default use fork to create new processes. You can research what spawn and fork mean in connection with creating new processes. Ultimately to be memory and CPU efficient, you only want as global variables outside of a if __name__ == '__main__': block those variables which have to be global. This is why I have the declaration of the lists local to a function.

Then using the concurrent.futures module we have:

from concurrent.futures import ProcessPoolExecutor

def searching_algorithm(a, b, c, d):
    ...
    return a * b * c * d

def main():
    # We assume a, b, c and d each have 100 or more elements:
    a = list(range(1, 101))
    b = list(range(2, 102))
    c = list(range(3, 103))
    d = list(range(4, 104))
    # Use all CPU cores:
    with ProcessPoolExecutor() as executor:
        result = list(executor.map(searching_algorithm, a[0:100], b[0:100], c[0:100], d[0:100]))
    print(result[0], result[-1])

# Required for Windows:
if __name__ == '__main__':
    main()

Prints:

24 106110600

To use the multiprocessing module instead:

from multiprocessing import Pool

def searching_algorithm(a, b, c, d):
    ...
    return a * b * c * d

def main():
    # We assume a, b, c and d each have 100 or more elements:
    a = list(range(1, 101))
    b = list(range(2, 102))
    c = list(range(3, 103))
    d = list(range(4, 104))
    # Use all CPU cores:
    with Pool() as pool:
        result = pool.starmap(searching_algorithm, zip(a[0:100], b[0:100], c[0:100], d[0:100]))
    print(result[0], result[-1])

# Required for Windows:
if __name__ == '__main__':
    main()

In both coding examples if the lists a, b, c and d contain exactly 100 elements, then there is no need to take slices of them such as a[0:100]; just pass the lists themselves, e.g.:

        result = list(executor.map(searching_algorithm, a, b, c, d))
Booboo
  • 38,656
  • 3
  • 37
  • 60
  • I use WIndows OS – Ali Aug 04 '21 at 20:45
  • I added the `windows` tag to your question (you could have edited your question and done this yourself). Multiprocessing is a big, complicated topic. But if you look at the above code and then look up these classes and methods in the Python documentation, hopefully it will make sense. So after you have "digested" this, let me know if this has satisfactorily answered your question. – Booboo Aug 04 '21 at 22:34
  • thanks a lot for your answer, this is very useful for me, I wrote my code as you wrote here, it worked but it takes much time than I expected. When I run a single function on CPU it takes about 6 seconds and I expected that when I apply multiprocessing for 100 different function calls it should take about maximum 10 seconds. But in my case it took 80 seconds. What can be reason for that? – Ali Aug 05 '21 at 08:56
  • You need to post more of your code (what is `searching_algorithm` doing and what packages is it using?) . Also, how many cores do you have? But if you ran it without multiprocessing, this would take 100 * 6 = 600 seconds. This is a considerable performance improvement. There is also a distinction between logical and physical cores. My desktop has 8 logical cores given by `multiprocessing.cpu_count()` or `os.cpu_count()` and that would be my default pool size. But really there are only 4 physical cores. (more...) – Booboo Aug 05 '21 at 09:35
  • See [Multiprocessing: use only the physical cores?](https://stackoverflow.com/questions/40217873/multiprocessing-use-only-the-physical-cores). So, for example, you might have 14 logical cores but you won't get a reduction in time by a factor of 14. I don't, however, agree that you should not use all the logical processors, especially if there is some I/O being done by your function. – Booboo Aug 05 '21 at 09:44
  • I got what you mean, in my machine there are 10 physical cores and when I run the program CPU utilization was 100%, I think It will be much fast when I add more cores to my machine. Until now everything is clear. Thanks Booboo – Ali Aug 05 '21 at 19:33