I have two data_frames as follows:
ID. Date1 X1
0002 2015-08-01 81
0002 2015-08-15 95
0003 2015-10-01 115.
0002 2015-09-15 126
0003 2015-10-15 129
0002 2016-02-01 265
0002 2016-02-15 279.
0003 2016-03-15 281.
0003 2016-03-15 281.
0002 2016-03-01 294
0003 2016-04-01 298.
0003 2016-04-01 298.
0002 2016-03-15 308
0003 2017-02-15 618
0002 2017-02-01 631.
0003 2017-03-01 632
0002 2017-02-15 645
And the second as fallows:
ID. DATE2.
0002 2016-02-11
0003 2016-03-16
0002 2017-01-19
0003 2015-09-10
0003 2016-02-19
0003 2016-03-22
0003 2015-12-07
Now, I want to join –merge-- these two data_frameworks into one data_frame based on ID and Data1 and Date2 with certain conditions. So my desired dataframe should be like these:
ID Date1 X1 Date2
0002 2015-08-01 81 Nan
0002 2015-08-15 95 Nan
0003 2015-10-01 115. 2015-09-10
0002 2015-09-15 126 Nan
0003 2015-10-15 129 Nan
0002 2016-02-01 265 Nan
0002 2016-02-15 279. 2016-02-11
0003 2016-03-15 281. 2015-12-07
0003 2016-03-15 281. 2016-02-19
0003 2016-04-01 298. 2016-03-16
0003 2016-04-01 298. 2016-03-22
0002 2016-03-15 308 Nan
0003 2017-02-15 618 Nan
0002 2017-02-01 631. 2017-01-19
0003 2017-03-01 632 Nan
0002 2017-02-15 645 Nan
Lets say Date1 points to date of the collected data, and Date2 refers to an event. I need to show in the output DataFrame, whether an event accrued between the two dates of collected data. E.g., for ID:003 between 2015-10-01 and 2015-10-15 whether an event happened or not? In this example the difference is 15 days, but we have case that the difference between the collected data is more than one month e.g., ID:002 2015-10-15 and 2016-02-15. In addition, if between two dates (in Date1) two events happened I want to show them in a different rows like the above example. Does anyone can help to handle this?
I also tried this code:
myf = dataframe1.merge(dataframe2, how='left',on=['ID'])
mask = (pd.to_datetime(myf['Data1'], format='%Y-%m-%d') -
pd.to_datetime(myf['Data2'], format='%Y-%m-%d')).apply(lambda I: i.days <= 15 and i.days > 0)
myf.loc[~mask, 'Data2'] = np.nan
But, it solves only if the data are collected (in dataframe1) every 15 days, which in my case it is not always true.