I have written a code (see below) to make some calculations on the dataset and add the results as a column to it.
ratio_list = []
for s,p,f in zip(A["s"], A["p"], A["f"]):
m = A[(A["s"]==s) & (A["p"]==p) & (A["f"]<f)][['a', 't']].product(axis=1).sum()
n = A[(A["s"]==s) & (A["p"]==p) & (A["f"]<f)]['a'].sum()
if(n==0):
ratio_list.append(0)
else:
ratio_list.append(m/n)
A["ratio"] = ratio_list
Here, A
is a pandas data frame; s, p, f, a, t
are column names.
I want to add a column ratio
consisting of the results of some calculations as you can see in the code.
This codes takes 10 minutes to run in a jupyter notebook. I wonder if I can write in a different way so that it takes less time?
A sample data with the result as a column "ratio": (in csv) (couldn't add a file)
,s,p,f,a,t,ratio
0,101,2018,2018-01-06,2.0,10.0,13.0
1,101,2018,2018-01-06,2.0,12.0,13.0
2,101,2018,2018-01-03,4.0,14.0,0.0
3,101,2018,2018-01-03,16.0,12.0,0.0
4,101,2018,2018-01-03,12.0,14.0,0.0
5,101,2018,2018-01-06,4.0,10.0,13.0
6,101,2018,2018-01-06,14.0,23.0,13.0
7,101,2018,2018-01-08,4.0,10.0,15.222222222222221
8,101,2018,2018-01-08,20.0,14.0,15.222222222222221
9,101,2018,2018-01-08,21.0,23.0,15.222222222222221
10,101,2018,2018-01-08,21.0,23.0,15.222222222222221
11,101,2018,2018-01-09,4.0,8.0,17.566666666666666
12,101,2018,2018-01-09,10.0,14.0,17.566666666666666
13,101,2018,2018-01-13,13.0,23.0,17.01492537313433
14,101,2018,2018-01-13,9.0,23.0,17.01492537313433
15,103,2018,2018-01-31,20.0,15.0,0.0
16,103,2018,2018-01-31,2.0,15.0,0.0
17,103,2018,2018-01-31,20.0,15.0,0.0
18,103,2018,2018-01-31,20.0,15.0,0.0
19,103,2018,2018-01-31,20.0,15.0,0.0