Given two dataframes df_1
and df_2
, how to aggregate values of df_2
into rows of df_1
such that date
in df_1
is between open
and close
in df_2
print df_1
date A B
0 2021-11-01 0.020228 0.026572
1 2021-11-02 0.057780 0.175499
2 2021-11-03 0.098808 0.620986
3 2021-11-04 0.158789 1.014819
4 2021-11-05 0.038129 2.384590
print df_2
open close location division size
0 2021-11-07 2021-11-14 LDN Alpha 120
1 2021-11-01 2021-11-14 PRS Alpha 450
2 2021-10-14 2021-11-27 HK Beta 340
I have tried this solution to joining my dataframes, now I need to find a way to aggregate. What I did so far is:
df_2.index = pd.IntervalIndex.from_arrays(df_2['open'],df_2['close'],closed='both')
df_1['events'] = df_1['date'].apply(lambda x : df_2.iloc[df_2.index.get_loc(x)])
print(calls['code'].iloc[0].groupby(['location', 'division'])['size'].sum())
location division
LDN Alpha 421.0
LDN Beta 515.0
NY Alpha 369.0
PRQ Alpha 132.0
Gamma 110.0
I need something that looks like this:
date A B LDN_Alpha LDN_Beta LDN_Gamma PRS_Alpha ...
0 2021-11-01 0.020228 0.026572 120 300 0 530
1 2021-11-02 0.057780 0.175499 ...
2 2021-11-03 0.098808 0.620986
3 2021-11-04 0.158789 1.014819
4 2021-11-05 0.038129 2.384590
Where the created columns are the sum of size
grouped by location
and division