0

I have two data_frames as follows:

ID.           Date1        X1    
0002       2015-08-01      81
0002       2015-08-15      95
0003       2015-10-01      115.    
0002       2015-09-15      126
0003       2015-10-15      129
0002       2016-02-01      265
0002       2016-02-15      279.      
0003       2016-03-15      281.    
0003       2016-03-15      281.     
0002       2016-03-01      294
0003       2016-04-01      298.       
0003       2016-04-01      298.       
0002       2016-03-15      308
0003       2017-02-15      618
0002       2017-02-01      631.        
0003       2017-03-01      632
0002       2017-02-15      645

And the second as fallows:

       ID.      DATE2.    
      0002   2016-02-11
  0003   2016-03-16
  0002   2017-01-19
  0003   2015-09-10
  0003   2016-02-19
  0003   2016-03-22
  0003   2015-12-07

Now, I want to join –merge-- these two data_frameworks into one data_frame based on ID and Data1 and Date2 with certain conditions. So my desired dataframe should be like these:

ID       Date1                  X1     Date2
0002       2015-08-01           81     Nan  
0002       2015-08-15           95     Nan
0003       2015-10-01           115.   2015-09-10
0002       2015-09-15           126    Nan
0003       2015-10-15           129    Nan
0002       2016-02-01           265    Nan
0002       2016-02-15           279.   2016-02-11
0003       2016-03-15           281.   2015-12-07
0003       2016-03-15           281.   2016-02-19
0003       2016-04-01           298.   2016-03-16
0003       2016-04-01           298.   2016-03-22
0002       2016-03-15           308    Nan
0003       2017-02-15           618    Nan
0002       2017-02-01           631.   2017-01-19
0003       2017-03-01           632    Nan
0002       2017-02-15           645    Nan

Lets say Date1 points to date of the collected data, and Date2 refers to an event. I need to show in the output DataFrame, whether an event accrued between the two dates of collected data. E.g., for ID:003 between 2015-10-01 and 2015-10-15 whether an event happened or not? In this example the difference is 15 days, but we have case that the difference between the collected data is more than one month e.g., ID:002 2015-10-15 and 2016-02-15. In addition, if between two dates (in Date1) two events happened I want to show them in a different rows like the above example. Does anyone can help to handle this?

I also tried this code:

    myf = dataframe1.merge(dataframe2, how='left',on=['ID'])
    mask = (pd.to_datetime(myf['Data1'], format='%Y-%m-%d') -             
    pd.to_datetime(myf['Data2'], format='%Y-%m-%d')).apply(lambda I: i.days <= 15 and i.days > 0)
    myf.loc[~mask, 'Data2'] = np.nan

But, it solves only if the data are collected (in dataframe1) every 15 days, which in my case it is not always true.

Spedo
  • 355
  • 3
  • 13
  • This is a left outer merge, no? `df1.merge(df2, left_on=['ID', 'Date1'], right_on=['ID', 'Date2'], how='left')` – cs95 Dec 28 '18 at 12:35
  • No, it does not work , because the values of the DAta1 in --dataframe 1-- are not identical with the values of Data2 in the second DataFrame. I also tried other code which I mentioned above -- I have edited my question--, but that one also did not solve the entire problem. – Spedo Dec 28 '18 at 13:29
  • Fair enough. Seems like you may have needed a combination of merge and merge_asof. Reopened. – cs95 Dec 28 '18 at 13:41

0 Answers0