I have two tables, x
and y
. x
has customer_id
and reporting_date
and y
contains customer_id
and trigger_date
.
For each row in x
I want to join the row in y
whose trigger_date
is the nearest to the reporting_date
in x
. I have hard-coded my desired result as z
in the example below, which should run in the terminal out-of-the-box.
import pandas as pd
import numpy as np
x = pd.DataFrame({
'customer_id': [1, 1, 1, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4],
'reporting_date': [
'2021-09-01',
'2021-10-01',
'2021-11-01',
'2019-01-01',
'2019-02-01',
'2021-12-01',
'2022-01-01',
'2022-02-01',
'2022-03-01',
'2022-04-01',
'2022-07-01',
'2022-08-01',
'2022-09-01'
],
'payment_status': [0, 0, 2, 0, 2, 0, 0, 0, 0, 2, 0, 0, 2]
})
y = pd.DataFrame({
'customer_id': [1, 1, 3, 3, 3, 4],
'trigger_date': [
'2019-08-21',
'2021-09-17',
'2020-01-01',
'2022-01-03',
'2022-03-08',
'2022-10-01']})
# DESIRED RESULT IS z:
z = x.__deepcopy__()
z['nearest_trigger_date'] = [
'2019-09-17',
'2019-09-17',
'2019-09-17',
np.nan,
np.nan,
'2022-01-03',
'2022-01-03',
'2022-01-03',
'2022-03-08',
'2022-03-08',
'2022-10-01',
'2022-10-01',
'2022-10-01' ]
x['reporting_date'] = x['reporting_date'].apply(pd.to_datetime)
y['trigger_date'] = y['trigger_date'].apply(pd.to_datetime)
z['nearest_trigger_date'] = z['nearest_trigger_date'].apply(pd.to_datetime)
EDIT This is not a duplicate of Merging series of pandas dataframe into single dataframe. That is a concatenation, not a "merge" or "join". And pd.merge_asof only supports merging on one column.