How can I parallelize this python code to improve speed?

Question

The goal of the code is to generate a large dataset (1,000,000 or more samples). engine is a class instance used to generate a sample, where each sample is (8,8,8). Here is a simplistic example of what I am doing:

def collect_sample_mp(inputs):
    return inputs[0].sample(inputs[1])

def mp_generator(engine, num_samples=10000, low=10):
    start_time = perf_counter()
    pool = Pool(os.cpu_count())
    results = pool.map(collect_sample_mp, [(engine, low)]*num_samples)
    print(f"Time: {(perf_counter() - start_time)}")
    return samples

or a dumbed down version to run:

def collect_sample_mp(inputs):
    return np.random.randn(8,8,8)

def mp_generator(temp1=1, num_samples=10000, temp2=1):
    start_time = perf_counter()
    pool = Pool(os.cpu_count())
    results = pool.map(collect_sample_mp, [(temp1, temp2)]*num_samples)
    print(f"Time: {(perf_counter() - start_time)}")
    return samples

The second example is shown as I use the same input variables for each sample but want num_samples samples.

Is there a faster alternative?

I have tried using ray but this package has caused me issues. Specifically when I increase the iterations the program times out/errors out (and I have not found helpful solutions online).

So, is there a better way to parallelize the computation?

Pandas for the win! If you can start to exploit vectorisation - and it looks like your doing common computations with changing entries - i.e. SIMD valid [Same Instruction Multiple Data] - then you can see some astronomical speed ups. If this is still unanswered later on this evening, I'll try to put together something for you. — Amiga500, May 25 '22 at 13:27
@Amiga500 could you provide an example of this? I'm curious about the application. Could this also be done with numpy's vectorization abilities too? — user16573587, May 25 '22 at 22:57
Sorry, didn't get near it last night. And double sorry - I don't think I've completely grasped your problem. So is engine returning an (8,8,8), which is manipulated by collect_sample_mp or is collect_sample_mp returning an (8,8,8)? Can you double check the two code examples - they seem inconsistent? [or I'm missing something] If engine is collecting a unique (8,8,8) on every call - which is then manipulated by collect_sample_mp to return low*num_samples - then you'd need to determine whether it is engine() or the subsequent ops that are the bottleneck first. — Amiga500, May 26 '22 at 10:18
If engine is generating unique (8,8,8)s, - and its your time consuming part - which makes sense from a distance! then what is the varying input to it? Typically, python is dog-slow at generating separate processes - so you want to pay attention to chunksize. See here: https://stackoverflow.com/questions/53751050/multiprocessing-understanding-logic-behind-chunksize — Amiga500, May 26 '22 at 10:30
@Amiga500 collect_sample_mp calls a method from the engine which returns a sample, which is returned in collect_sample_mp function. (The second example can be negated, I was just supplying something runnable). There is nothing variable in the inputs. The inputs are the same for each sample generation. — user16573587, May 26 '22 at 13:47

How can I parallelize this python code to improve speed?

0 Answers0