1

Let's say I have a list like this:

list_base = ['a','b','c','d']

If I used for xxx in list_base:, the loop would parse the list one value at a time. If I want to double the speed of this work, I'm creating a list with two values to iterate over at once and calling multiprocessing.

Basic example

Code 1 (main_code.py):

import api_values

if __name__ == '__main__':
    list_base = ['a','b','c','d']
    api_values.main(list_base)

Code 2 (api_values.py):

import multiprocessing
import datetime

def add_hour(x):
    return str(x) + ' - ' + datetime.datetime.now().strftime('%d/%m/%Y %H:%M')

def main(list_base):
    a = list_base
    a_pairs = [a[i:i+2] for i in range(0, len(a)-1, 2)]
    if (len(a) % 2) != 0:
        a_pairs.append([a[-1]])  

    final_list = []

    for a, b in a_pairs:
        mp_1 = multiprocessing.Process(target=add_hour, args=(a,))
        mp_2 = multiprocessing.Process(target=add_hour, args=(b,))
        mp_1.start()
        mp_2.start()
        mp_1.join()
        mp_2.join()
        final_list.append(mp_1)
        final_list.append(mp_2)

    print(final_list)

When I analyze the final_list print it delivers values like this:

[
<Process name='Process-1' pid=9564 parent=19136 stopped exitcode=0>, 
<Process name='Process-2' pid=5400 parent=19136 stopped exitcode=0>, 
<Process name='Process-3' pid=13396 parent=19136 stopped exitcode=0>, 
<Process name='Process-4' pid=5132 parent=19136 stopped exitcode=0>
]

I couldn't get to the return values I want conquered by calling the add_hour(x) function.

I found some answers in this question:
How can I recover the return value of a function passed to multiprocessing.Process?

But I couldn't bring to the scenario I'm using where I need the multiprocessing inside a function and not inside if __name__ == '__main__':

When trying to use it, it always generates errors in relation to the position of the created code structure, I would like some help to be able to visualize the use for my need.

Note:
This codes are a basic's examples, my real use is to extract data from an API that allows for a maximum of two simultaneous calls.

Additional code:

According to @Timus comment (You might want to look into a **Pool** and **.apply_async**), I came to this code it seems to me it worked but I don't know if it is reliable, if there is any improvement that is necessary for its use and this option is the best, feel free to update in a answer:

import multiprocessing
import datetime

final_list = []

def foo_pool(x):
    return str(x) + ' - ' + datetime.datetime.now().strftime('%d/%m/%Y %H:%M:%S')

def log_result(result):
    final_list.append(result)

def main(list_base):
    pool = multiprocessing.Pool()
    a = list_base
    a_pairs = [a[i:i+2] for i in range(0, len(a)-1, 2)]
    if (len(a) % 2) != 0:
        a_pairs.append([a[-1]])

    for a, b in a_pairs:
        pool.apply_async(foo_pool, args = (a, ), callback = log_result)
        pool.apply_async(foo_pool, args = (b, ), callback = log_result)
    pool.close()
    pool.join()

    print(final_list)
Digital Farmer
  • 1,705
  • 5
  • 17
  • 67

2 Answers2

1

I think you need shared strings between processes. They can be obtained from multiprocessing.Manager().

Your api_values.py should look like this:

import multiprocessing
import datetime
from ctypes import c_wchar_p

def add_hour(x, ret_str):
    ret_str.value = str(x) + ' - ' + datetime.datetime.now().strftime('%d/%m/%Y %H:%M')

def main(list_base):
    a = list_base
    a_pairs = [a[i:i+2] for i in range(0, len(a)-1, 2)]
    if (len(a) % 2) != 0:
        a_pairs.append([a[-1]])  

    final_list = []
    manager = multiprocessing.Manager()

    for a, b in a_pairs:
        
        ret_str_a = manager.Value(c_wchar_p, "")
        ret_str_b = manager.Value(c_wchar_p, "")
        mp_1 = multiprocessing.Process(target=add_hour, args=(a, ret_str_a))
        mp_2 = multiprocessing.Process(target=add_hour, args=(b, ret_str_b))
        mp_1.start()
        mp_2.start()
        mp_1.join()
        mp_2.join()
        final_list.append(ret_str_a.value)
        final_list.append(ret_str_b.value)

    print(final_list)

Source: How to share a string amongst multiple processes using Managers() in Python?

mbostic
  • 903
  • 8
  • 17
  • 1
    Perfect friend, it worked like a glove for my need too, I'll leave the question open for a while longer so I can see other answers, this can help me in learning. Thanks a lot for the help! – Digital Farmer Apr 15 '22 at 18:43
1

You don't have to use a callback: Pool.apply_async() gives you a return (an AsyncResult object) which has a .get() method to retrieve the result of the submit. Extension of your attempt:

import time
import multiprocessing
import datetime
from os import getpid

def foo_pool(x):
    print(getpid())
    time.sleep(2)
    return str(x) + ' - ' + datetime.datetime.now().strftime('%d/%m/%Y %H:%M:%S')

def main(list_base):
    a = list_base
    a_pairs = [a[i:i+2] for i in range(0, len(a)-1, 2)]
    if (len(a) % 2) != 0:
        a_pairs.append([a[-1]])  

    final_list = []
    with multiprocessing.Pool(processes=2) as pool:
        for a, b in a_pairs:
            res_1 = pool.apply_async(foo_pool, args=(a,))
            res_2 = pool.apply_async(foo_pool, args=(b,))
            final_list.extend([res_1.get(), res_2.get()])

    print(final_list)

if __name__ == '__main__':
    list_base = ['a','b','c','d']
    start = time.perf_counter()
    main(list_base)
    end = time.perf_counter()
    print(end - start)

I have added the print(getpid()) to foo_pool to show that you're actually using different processes. And I've used time to illustrate that despite the time.sleep(2) in foo_pool the overall duration of main isn't much more than 2 seconds.

Timus
  • 10,974
  • 5
  • 14
  • 28
  • Hello friend, thank you very much in advance. But I believe we have a problem, it is making 4 calls at the same time instead of at most two (I commented in the question about this limitation), this ends up causing a conflict with my API. I tried adding a ```pool.join()``` but it didn't work. Would you be available to analyze this detail, please? – Digital Farmer Apr 15 '22 at 20:13
  • What I mean is: that when adding this ```time.sleep(2)```, ```a``` and ```b``` should have the same time in value, but ```c``` and ```d``` should deliver a time seconds later, not the same. Do you understand? – Digital Farmer Apr 15 '22 at 20:16
  • 1
    @BrondbyIF I think I understand: See the edit. – Timus Apr 15 '22 at 20:19
  • 1
    Perfect @Timus , I just did the new test and now it delivered that necessary wait between loops, thank you very much first for the tip on which way to go and for bringing an answer later with the correct way to use it! – Digital Farmer Apr 15 '22 at 20:21
  • ```['a - 15/04/2022 17:21:41', 'b - 15/04/2022 17:21:41', 'c - 15/04/2022 17:21:43', 'd - 15/04/2022 17:21:43']```, 2 seconds between each loop, demonstrating that are making a maximum of two calls at a time and waiting for the end of these two calls to continue the loop. – Digital Farmer Apr 15 '22 at 20:22
  • Friend @Timus , I noticed that there was an update in the answer adding ```processes=2```, before 4 different ids appeared like ```4178 2547 7936 6947``` now only two are appearing repeating ```4178 2547 4178 2547```, Does this make any difference in terms of my need (for API use) and s it better to repeat the ID? – Digital Farmer Apr 15 '22 at 20:46
  • 1
    @BrondbyIF No, it shouldn't make a difference. But since you're only using 2 processes at a time, I thought it would be sensible to limit the amount (_processes_ is the number of worker processes to use). (Creating a new process comes with a certain amount of overhead.) – Timus Apr 15 '22 at 20:53