Similar to one of my previous questions (Merge dataframes on nearest datetime / timestamp), I would like to merge two pandas data frames on two datetime columns using nearest match:
Let A and B be two dataframes as follows:
A = pd.DataFrame({"ID":["A", "A", "C" ,"B", "B"], "init_date":["01/01/2015","07/02/2014","08/02/1999","01/01/1991","06/22/2014"], "fin_date":["04/16/1923","09/24/1945","06/24/1952","11/26/1988","10/05/1990"]})
In [15]: A
Out[15]:
ID fin_date init_date
0 A 04/16/1923 01/01/2015
1 A 09/24/1945 07/02/2014
2 C 06/24/1952 08/02/1999
3 B 11/26/1988 01/01/1991
4 B 10/05/1990 06/22/2014
B = pd.DataFrame({"ID":["A", "A", "C" ,"B", "B"], "date":["02/15/2015","06/30/2014","07/02/1999","10/05/1990","06/24/2014"],"fin_date":["12/10/1926","01/01/1944","08/21/1955","12/12/1987","11/05/1991"], "value": ["3","5","1","7","8"] })
In [11]: B
Out[11]:
ID date fin_date value
0 A 02/15/2015 12/10/1926 3
1 A 06/30/2014 01/01/1944 5
2 C 07/02/1999 08/21/1955 1
3 B 10/05/1990 12/12/1987 7
4 B 06/24/2014 11/05/1991 8
The resulting data frame should look like the following:
In [21]: C
Out[21]:
ID fin_date init_date value
0 A 04/16/1923 01/01/2015 3
1 A 09/24/1945 07/02/2014 5
2 C 06/24/1952 08/02/1999 1
3 B 11/26/1988 01/01/1991 7
4 B 10/05/1990 06/22/2014 8
The general problem would potentially not have a close match for neither init_date nor fin_date, however, I would also be interested in a solution when there are exact matches for init_date, for example.
Note that one difficulty is that one match might be closer to the value in init_date than on final date, while a competing match might be the opposite. In that case, I prefer the one that is closer to init_date. To my knowledge, after attempting a similar approach as the one in the link, I get that reindexing by "nearest" is not implemented for multi-indexing.
Thank you and I appreciate your help,