Sample dataframe:
import pandas as pd
df1 = pd.DataFrame({'date': ['2022-02-25','2022-02-26','2022-02-25'],
'attempt_1':[1,0,0],
'attempt_2':[0,0,1],
'attempt_3':[1,1,0]})
df1
The following code takes 1hr 30mins to process about 252k rows and 75 columns. Unfortunately, that is too long and I don't know if there is any method out there that can replace(?) this to reduce the processing time.
dategrp = set(df1["date"])
dfs_to_dategrp = []
for trial in df1.columns[:]:
save_sf = {k:{"success":0, "fail":0} for k in dategrp}
for c in dategrp:
save_sf[c]["success"] += len(df1[(df1["date"]==c) & (df1[trial]==1)])
save_sf[c]["fail"] += len(df1[(df1["date"]==c) & (df1[trial]==0)])
new_df1 = pd.DataFrame(save_sf)
new_df1 = new_df1.T
dfs_to_dategrp.append(new_df1)
result= pd.concat(dfs_to_dategrp, axis = 1)
result
So in the dataframe, the columns are populated randomly with 1s (success) and 0s (fail). This code goes through each rows and columns (this method is too long) to group and total the 1s and 0s to the date of occurence.
Here is the output
Any help given will be appreciated. Thanks.