Two different functions with same args and map_async clarification

Question

I want to run two functions which in turn run a query to two different databases. Both functions take the same args and it seems I could speed up this by utilising multiprocessing.

However, I am not quite sure what multiprocessing class I should use.

I initially looked at the Pool class and it seems I could use map_async(). However, I also saw in this answer, that map_async() in a similar use-case does not actually let the functions run in parallel does anybody know why?: https://stackoverflow.com/a/44155077/7468886

If this is the case, is there any other solutions that can be used?

Below are the two functions. After the results are returned, I consolidate them.

    sql_manager = SQLManager()
    sql_results = sql_manager.search_data(
        title=title,
        type=type,
        release_year=release_year)

    mongo_manager = MongoManager()
    mongo_results = mongo_manager.search_data(
        title=title,
        type=type,
        release_year=release_year)

Edit:

The search_data() functions do not run a simple query but actually the following:

Constructs a query string based on args
Retrieves a class member database connection
Initialises a cursor
Executes the query
Iterates through each row and creates a dict from each row
Appends this dict into a list
Returns the list

Since the function does all of this would it lend itself to multiprocessing still rather than multithreading?

I think in this case actually you would want to use multi-threading and not multi-processing — gold_cy, Jan 03 '20 at 15:00
I thought multiprocessing would be applicable as this would get around the GIL. Does multithreading in this example guarantee that each function would be run concurrently? — vinayman, Jan 03 '20 at 15:08
This is a typical use case for multi-threading or async-io, here is a nice reference for when to use multiprocessing vs multithreading: https://medium.com/contentsquare-engineering-blog/multithreading-vs-multiprocessing-in-python-ece023ad55a — jeremie, Jan 03 '20 at 15:09
Would it make a difference that each search_data function actually does the following (apologies I did not add this detail in the initial question): 1. Retrieves a database connection; 2. Initialises a cursor; 3. Runs a query; 4. Iterates through each row in the result and creates a dict; 5. Inserts this dict into a list; 6. Returns the list of dicts... Hence it is more CPU intensive than just running a query to a database — vinayman, Jan 03 '20 at 15:16

Two different functions with same args and map_async clarification

0 Answers0