2

someList = [x for x in someList if not isOlderThanXDays(x, XDays, DtToday)]

I have this line and the function isOlderThanXDays makes some API calls causing it to take a while. I would like to perform this using multi/parrellel processing in python. The order in which the list is done doesn't matter (so asynchronous I think)

The function isOlderThanXDays essentially returns a boolean value and everything newer than is kept in the new list using List Comprehension.

Edit: Params of function: So the XDays is for the user to pass in lets say 60 days. and DtToday is today's date (date time object). Then I make API calls to see metaData of the file's modified date and return if it is older I return true otherwise false.

I am looking for something similar to the question below. The difference is this question for every list input there is an output, whereas mine is like filtering the list based on boolean value from the function used, so I don't know how to apply it in my scenario

How to parallelize list-comprehension calculations in Python?

Mars
  • 2,505
  • 17
  • 26
Jay M
  • 41
  • 8
  • 4
    Please show us what you have done and ask a specific technical question – Mars Nov 21 '19 at 07:18
  • can you tell the isOlderThanXDays params ? – kederrac Nov 21 '19 at 07:50
  • @Mars I can't show the code for the function as it's for my workplace as well as it's complex and unrelated but essentially I need to apply list comprehension based on the value being True or False. So I am filtering the new list from the old list. I have edited the question to add an example link details hope that helps. – Jay M Nov 21 '19 at 08:13
  • The params aren't important here.... – Mars Nov 21 '19 at 08:15
  • @rusu_ro1 Not sure if the parameters will help but I have made an edit to add parameters as well as a description of the function :) – Jay M Nov 21 '19 at 08:15
  • @JayM In general, stackoverflow is for specific questions, not requests for someone to code something for you. – Mars Nov 21 '19 at 08:17
  • If there is something specific about the linked question that you don't understand, you should ask that – Mars Nov 21 '19 at 08:24

2 Answers2

3

This should run all of your checks in parallel, and then filter out the ones that failed the check.

import multiprocessing

try:
    cpus = multiprocessing.cpu_count()
except NotImplementedError:
    cpus = 2   # arbitrary default


def MyFilterFunction(x):
    if not isOlderThanXDays(x, XDays, DtToday):
        return x
    return None

pool = multiprocessing.Pool(processes=cpus)
parallelized = pool.map(MyFilterFunction, someList)
newList = [x for x in parallelized if x]
Mars
  • 2,505
  • 17
  • 26
  • Thanks heaps I should have thought of this! I'll test out both solutions at work on Tuesday (Can only access Rest API on their server) but for now I have marked the other one right as my List has 25,000 - 100,000 objects. I predict checking to see if each object exists in the List may be slightly slower. I'll change the answer if this is more efficient – Jay M Nov 21 '19 at 14:22
  • @JayM The answer you accepted does the exact same thing (inserts None, or assumes that `isOlderThanXDays` returns a None for you). The only benefit is that it uses `partial` which is a nice way to set XDays, DtToday. – Mars Nov 25 '19 at 05:29
  • 1
    PS, you need to set XDays and DtToday somehow too. Partial will do this for you, or you can set it yourself some other way. I just showed you how to use the code that you linked, as-is – Mars Nov 25 '19 at 05:30
1

you can use ThreadPool:

from multiprocessing.pool import ThreadPool # Class which supports an async version of applying functions to arguments
from functools import partial

NUMBER_CALLS_SAME_TIME = 10 # take care to avoid throttling
# Asume that isOlderThanXDays signature is isOlderThanXDays(x, XDays, DtToday)
my_api_call_func = partial(isOlderThanXDays, XDays=XDays, DtToday=DtToday)
pool = ThreadPool(NUMBER_CALLS_SAME_TIME)
responses = pool.map(my_api_call_func, someList)
kederrac
  • 16,819
  • 6
  • 32
  • 55
  • 1
    This is what I was looking for! I'll verify the efficiency on Tuesday when I am at work (can only access rest API on server) – Jay M Nov 21 '19 at 14:23