I have a script where I need to aggregate the data based on some logic. For examples here is my data frame
1. df.columns = ['a','b','c','d']
2. Dataframe contains more than 1 million data(10 millions in some cases)
3. Now I need to aggregate data something like as(having 4 nested loops)-
groupby_a = df.groupby(['a'])
for i,df_a in groupby_a:
#some logic...
groupby_b = df_a.groupby(['b'])
for j,df_b in groupby_b: # loop again
# logic followed by 2 more nested loops(groupby 'c' and 'd')
My issue is that it is taking too much time to process the data. Is there anyway I can increase the performance? Any help really appreciated.