I have generated permutations list with the itertools.permutation (all permutations) or numpy.permuted (part of all permutations) functions in python, depending on how big all permutations are. This part of the code is ok and works well and quickly.
However, the iterator list is big enough (100k or bigger) and I would like to go through it with multiple threads but don't really know how to accomplish that.
Here is what I have so far. The chunk of code is working but in an inefficient way, it takes a long time to complete the task. I have tried to use multiprocessing.Pool() function, but I have not been able to make it work
avg = {'Block': np.arange(1, 17, 1),
'A': np.random.uniform(0,1,16),
'B': np.random.uniform(0,1,16),
'C': np.random.uniform(0,1,16)
}
avg = pd.DataFrame(avg)
seed = 1234
thr = 100000
def permuta (avg, seed, thr):
kpi = []
# Permutations
if len(pd.unique(avg.Block)) > 9:
rng = np.random.default_rng(seed)
perm = rng.permuted(np.tile(pd.unique((avg.Block)-1).astype(int), thr).reshape(thr, (pd.unique(avg.Block)-1).size), axis=1)
aa = list(perm)
aa1 = [tuple(x) for x in aa]
bb = [np.arange(0, len(avg))]
bb1 = tuple(bb[0])
if bb1 not in aa1:
bb.extend(aa)
aa = [tuple(x) for x in bb]
else:
aa = [tuple(x) for x in aa]
else:
perm = permutations(avg.index)
aa = list(perm)
n0 = len(aa)
for i in aa:
if (aa.index(i)+1) % 1000 == 0:
print(' progress: {:.2f}%'.format((aa.index(i)+1)/n0*100))
df = avg.loc[list(i)]
df.reset_index(drop=True, inplace=True)
model_A = LinearRegression(fit_intercept=True).fit(df.index.values.reshape(-1,1), df.A)
model_B = LinearRegression(fit_intercept=True).fit(df.index.values.reshape(-1,1), df.B)
model_C = LinearRegression(fit_intercept=True).fit(df.index.values.reshape(-1,1), df.C)
block_order_id = tuple(x+1 for x in i)
model_kpi = [block_order_id, model_A.coef_[0], model_B.coef_[0], model_C.coef_[0]]
kpi.append(model_kpi)
kpi = pd.DataFrame (kpi, columns = ['Block_ord', 'm_A', 'm_B', 'm_C'])
return kpi
I would be grateful if someone could help me to speed up the code execution, using all cores for calculations, replacing the "for" loop for a more efficient iterator, or a mix of them.
Thanks for your help