I have the following dataframes
import pandas as pd
foo1 = pd.DataFrame({'id':[1,1,2,2],
'phase':['Pre','Post','Pre','Post'],
'date_start': ['2022-07-24', '2022-12-25', '2022-09-30', '2022-12-25'],
'date_end': ['2022-07-30', '2023-03-07', '2022-10-05', '2023-03-04']})
foo2 = pd.DataFrame({'id': [1,1,1,1,
2,2,2,2],
'date': ['2022-07-24', '2022-07-25', '2022-12-26', '2023-01-01',
'2022-10-04', '2022-11-25', '2022-12-26', '2023-03-01']})
print(foo1, '\n' ,foo2)
id phase date_start date_end
0 1 Pre 2022-07-24 2022-07-30
1 1 Post 2022-12-25 2023-03-07
2 2 Pre 2022-09-30 2022-10-05
3 2 Post 2022-12-25 2023-03-04
id date
0 1 2022-07-24
1 1 2022-07-25
2 1 2022-12-26
3 1 2023-01-01
4 2 2022-10-04
5 2 2022-11-25
6 2 2022-12-26
7 2 2023-03-01
I would like to get the phase
column in foo2
by merging on id
and if date
is between date_start
and date_end
. If date
is not within the range [date_start,date_end]
then the phase column should have NaN
The resulting dataframe should look like this:
id date phase
0 1 2022-07-24 Pre
1 1 2022-07-25 Pre
2 1 2022-12-26 Post
3 1 2023-01-01 Post
4 2 2022-10-04 Pre
5 2 2022-11-25 NaN
6 2 2022-12-26 Post
7 2 2023-03-01 Post
How could I do that ?
I found this but it does not include "merging as well with id
"