The goal of the code is to generate a large dataset (1,000,000 or more samples). engine
is a class instance used to generate a sample, where each sample is (8,8,8).
Here is a simplistic example of what I am doing:
def collect_sample_mp(inputs):
return inputs[0].sample(inputs[1])
def mp_generator(engine, num_samples=10000, low=10):
start_time = perf_counter()
pool = Pool(os.cpu_count())
results = pool.map(collect_sample_mp, [(engine, low)]*num_samples)
print(f"Time: {(perf_counter() - start_time)}")
return samples
or a dumbed down version to run:
def collect_sample_mp(inputs):
return np.random.randn(8,8,8)
def mp_generator(temp1=1, num_samples=10000, temp2=1):
start_time = perf_counter()
pool = Pool(os.cpu_count())
results = pool.map(collect_sample_mp, [(temp1, temp2)]*num_samples)
print(f"Time: {(perf_counter() - start_time)}")
return samples
The second example is shown as I use the same input variables for each sample but want num_samples
samples.
Is there a faster alternative?
I have tried using ray
but this package has caused me issues. Specifically when I increase the iterations the program times out/errors out (and I have not found helpful solutions online).
So, is there a better way to parallelize the computation?