The code below created a dictionary of permutations, on a list of arrays of customer ids/indices.
Test code can be run to generate a basic sample set:
import numpy as np
import pandas as pd
import itertools
def func(u_data):
perm_ = pd.DataFrame(itertools.permutations(u_data))
p_ = perm_.set_index(perm_.shape[1]-1).to_dict()
return p_
if __name__ == "__main__":
cust_indices =[np.array([90,91]),np.array([100,101]),np.array([68,69])]
temp_indices = []
temp_indices = list(map(lambda i: func(i), cust_indices))
When cust_indices increases in the number of elements in array, the code is killed on EC2 AWS (i.e. it causes a OOM/ memory error).
The code crashes at perm_ = pd.DataFrame(itertools.permutations(u_data))
when cust_indices =[np.array([90,91]),np.array([100,101]),np.array([68,69]),np.array([1234372, 1234373, 1234374, 1234375, 1234376, 1234377, 1234378,1234379, 1234380, 1234381, 1234382, 1234383, 1234384, 1234385])]
I am currently trying to optimize the code to cater for a larger dataset and prevent an OOM error by either using multi-processing or by updating line perm_ = pd.DataFrame(itertools.permutations(u_data))
.