This is the sample table in panda data frame ( actual row count is approx. 200K)
Acct Ent Co Period Actuals
11 100 Global Jan 1000
11 100 Global Jan 1000
11 100 Global Feb 1500
11 100 Global Feb 1200
11 100 Global Mar 1600
11 100 Global Mar 1300
the output I intent to get is - a comparison between actual and Run rate ( group by period). Run rate means previous months Average. Suppose if I select period 'March' and run the query, I should get following output. Here 2350 in run rate column is average of Jan & Feb total (2000+2700)/2 = 2350
Acct Ent Co Period Actuals Run Rate
11 100 Global Mar 2900 2350
I' am able to achieve this through 'for' loop running through each row but it is not practical for heavy dataset. In case of huge dataset, loops takes more than 6-7 hours to run.
is there any way we can achieve this without creating 'for' loop? looking for some directions.
Following code I am using which is fine but as mentioned above not practical.
def runrate():
global excel_data_df
x=len(excel_data_df)-1
runrate=[]
for i in range (0,x+1):
data1 =excel_data_df.loc[i]
sum1 =0
d=x
list1=finder(data1.Period)
if(data1.Period=="Jan"):
runrate.append(0)
elif(isinstance(list1,list)==False):
runrate.append(sum1)
else:
for items in list1:
d=i
while(d>=0):
data2=excel_data_df.loc[d]
if(items== data2.Period and data1.Account== data2.Account and data1.Country== data2.Country and data2.Functions==data1.Functions):
sum1=sum1+data2.Actuals
else:
pass
d=d-1
sum1=sum1/len(list1)
runrate.append(sum1)
excel_data_df["runrate"]=runrate