I have 2 dataframes.
My main dataframe dffinal
date id och och1 och2 och3 cch1 LCH L#
0 3/27/2020 1 -2.1 3 3 1 5 NaN NaN
1 4/9/2020 2 2.0 1 2 1 3 NaN NaN
My second dataframe df2
date och cch och1 och2 och3 cch1
0 5/30/2012 -0.7 -0.7 3 -1 1 56
1 9/16/2013 0.9 -1.0 6 4 3 7
2 9/26/2013 2.5 5.4 2 3 2 4
3 8/26/2016 0.1 -0.7 4 3 5 10
I have this loop
for i in dffinal.index:
df3=df2.copy()
df3 = df3[df3['och1'] >dffinal['och1'].iloc[i]]
df3 = df3[df3['och2'] >dffinal['och2'].iloc[i]]
df3 = df3[df3['och3'] >dffinal['och3'].iloc[i]]
df3 = df3[df3['cch1'] >dffinal['cch1'].iloc[i]]
dffinal['LCH'][i] =df3["och"].mean()
dffinal['L#'][i] =len(df3.index)
As it is clear from my code the values of LCH and L# are obtained from df2(df3) based on above conditions.
This code works very well, but it is very slow. I found out that i can improve efficiency with pandas vectorization. However, I could not figure out how to do it for my case.
This is my desired result
date id och och1 och2 och3 cch1 LCH L#
0 3/27/2020 1 -2.1 3 3 1 5 0.900000 1.0
1 4/9/2020 2 2.0 1 2 1 3 1.166667 3.0
I would greatly appreciate if you could help me to increase the efficiency of my code
Correct answer
I personally use the answer of @shadowtalker easy method, simply because I can undesrtand how it works.
The most efficient answer is fast but complex