0

I want to run two functions which in turn run a query to two different databases. Both functions take the same args and it seems I could speed up this by utilising multiprocessing.

However, I am not quite sure what multiprocessing class I should use.

I initially looked at the Pool class and it seems I could use map_async(). However, I also saw in this answer, that map_async() in a similar use-case does not actually let the functions run in parallel does anybody know why?: https://stackoverflow.com/a/44155077/7468886

If this is the case, is there any other solutions that can be used?

Below are the two functions. After the results are returned, I consolidate them.

    sql_manager = SQLManager()
    sql_results = sql_manager.search_data(
        title=title,
        type=type,
        release_year=release_year)

    mongo_manager = MongoManager()
    mongo_results = mongo_manager.search_data(
        title=title,
        type=type,
        release_year=release_year)

Edit:

The search_data() functions do not run a simple query but actually the following:

  • Constructs a query string based on args
  • Retrieves a class member database connection
  • Initialises a cursor
  • Executes the query
  • Iterates through each row and creates a dict from each row
  • Appends this dict into a list
  • Returns the list

Since the function does all of this would it lend itself to multiprocessing still rather than multithreading?

vinayman
  • 572
  • 1
  • 5
  • 10
  • 1
    I think in this case actually you would want to use multi-threading and not multi-processing – gold_cy Jan 03 '20 at 15:00
  • I thought multiprocessing would be applicable as this would get around the GIL. Does multithreading in this example guarantee that each function would be run concurrently? – vinayman Jan 03 '20 at 15:08
  • This is a typical use case for multi-threading or async-io, here is a nice reference for when to use multiprocessing vs multithreading: https://medium.com/contentsquare-engineering-blog/multithreading-vs-multiprocessing-in-python-ece023ad55a – jeremie Jan 03 '20 at 15:09
  • Would it make a difference that each search_data function actually does the following (apologies I did not add this detail in the initial question): 1. Retrieves a database connection; 2. Initialises a cursor; 3. Runs a query; 4. Iterates through each row in the result and creates a dict; 5. Inserts this dict into a list; 6. Returns the list of dicts... Hence it is more CPU intensive than just running a query to a database – vinayman Jan 03 '20 at 15:16

0 Answers0