Here's a way to do it.
In the sessions
dataframe, set date
column to be the index:
sessions = sessions.set_index('date')
Sort sessions by index (that is, by date):
sessions = sessions.loc[sessions.index.sort_values()]
Add a session_evaluated
column to evaluations which will contain the date of the session that the evaluation applies to. We calculate this by first calling sessions.index.get_indexer()
on the date
column of evaluations with the method
argument set to 'pad' so we "round down" on non-matching dates, and then doing a lookup on these integer index values in the sessions index (which contains the session dates):
evaluations['session_evaluated'] = pd.Series([sessions.index.to_list()[i]
for i in sessions.index.get_indexer(evaluations['date'], method='pad')])
Here's what it looks like all put together with sample inputs:
import pandas as pd
sessions = pd.DataFrame({
'date' : ['2022-02-01', '2022-03-01', '2022-04-01', '2022-05-01', '2022-01-01'],
'topic' : ['Easy 1', 'Easy 2', 'Intermediate', 'Advanced', 'Intro']
})
evaluations = pd.DataFrame({
'date' : [
'2022-01-05', '2022-01-10', '2022-01-15', '2022-01-20', '2022-01-25',
'2022-02-01', '2022-02-05', '2022-02-28',
'2022-03-01', '2022-03-15', '2022-03-31',
'2022-04-01', '2022-04-15'
],
'rating' : [9,8,7,8,9,5,4,3,10,10,10,2,4]
})
sessions['date'] = pd.to_datetime(sessions['date'])
evaluations['date'] = pd.to_datetime(evaluations['date'])
sessions = sessions.set_index('date')
sessions = sessions.loc[sessions.index.sort_values()]
print(sessions)
print(evaluations)
evaluations['session_evaluated'] = pd.Series([sessions.index.to_list()[i]
for i in sessions.index.get_indexer(evaluations['date'], method='pad')])
print(evaluations)
Results:
topic
date
2022-01-01 Intro
2022-02-01 Easy 1
2022-03-01 Easy 2
2022-04-01 Intermediate
2022-05-01 Advanced
date rating
0 2022-01-05 9
1 2022-01-10 8
2 2022-01-15 7
3 2022-01-20 8
4 2022-01-25 9
5 2022-02-01 5
6 2022-02-05 4
7 2022-02-28 3
8 2022-03-01 10
9 2022-03-15 10
10 2022-03-31 10
11 2022-04-01 2
12 2022-04-15 4
date rating session_evaluated
0 2022-01-05 9 2022-01-01
1 2022-01-10 8 2022-01-01
2 2022-01-15 7 2022-01-01
3 2022-01-20 8 2022-01-01
4 2022-01-25 9 2022-01-01
5 2022-02-01 5 2022-02-01
6 2022-02-05 4 2022-02-01
7 2022-02-28 3 2022-02-01
8 2022-03-01 10 2022-03-01
9 2022-03-15 10 2022-03-01
10 2022-03-31 10 2022-03-01
11 2022-04-01 2 2022-04-01
12 2022-04-15 4 2022-04-01
UPDATED:
Here's another way to do it using the merge_asof()
function. It doesn't require the date column to be the index (though it does require that both dataframe arguments be sorted by date
):
sessions['date'] = pd.to_datetime(sessions['date'])
evaluations['date'] = pd.to_datetime(evaluations['date'])
evaluations = pd.merge_asof(
evaluations.sort_values(by=['date']),
sessions.sort_values(by=['date'])['date'].to_frame().assign(session_evaluated=sessions['date']),
on='date')
print(evaluations)
Output:
date rating session_evaluated
0 2022-01-05 9 2022-01-01
1 2022-01-10 8 2022-01-01
2 2022-01-15 7 2022-01-01
3 2022-01-20 8 2022-01-01
4 2022-01-25 9 2022-01-01
5 2022-02-01 5 2022-02-01
6 2022-02-05 4 2022-02-01
7 2022-02-28 3 2022-02-01
8 2022-03-01 10 2022-03-01
9 2022-03-15 10 2022-03-01
10 2022-03-31 10 2022-03-01
11 2022-04-01 2 2022-04-01
12 2022-04-15 4 2022-04-01
UPDATE #2:
The call to assign()
in the above code can also be written using **kwargs
syntax, in case we want to use a column name with spaces or that otherwise is not a valid python identifier (instead of session_evaluated
). For example:
evaluations = pd.merge_asof(
evaluations.sort_values(by=['date']),
sessions.sort_values(by=['date'])['date'].to_frame()
.assign(**{'Evaluated Session (Date)' : lambda x: sessions['date']}),
on='date')
Output:
date rating Evaluated Session (Date)
0 2022-01-05 9 2022-01-01
1 2022-01-10 8 2022-01-01
2 2022-01-15 7 2022-01-01
3 2022-01-20 8 2022-01-01
4 2022-01-25 9 2022-01-01
5 2022-02-01 5 2022-02-01
6 2022-02-05 4 2022-02-01
7 2022-02-28 3 2022-02-01
8 2022-03-01 10 2022-03-01
9 2022-03-15 10 2022-03-01
10 2022-03-31 10 2022-03-01
11 2022-04-01 2 2022-04-01
12 2022-04-15 4 2022-04-01