0

I have two tables, x and y. x has customer_id and reporting_date and y contains customer_id and trigger_date.

For each row in x I want to join the row in y whose trigger_date is the nearest to the reporting_date in x. I have hard-coded my desired result as z in the example below, which should run in the terminal out-of-the-box.

import pandas as pd
import numpy as np

x = pd.DataFrame({
    'customer_id': [1, 1, 1, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4],
    'reporting_date': [
        '2021-09-01',
        '2021-10-01',
        '2021-11-01',
        '2019-01-01',
        '2019-02-01',
        '2021-12-01',
        '2022-01-01',
        '2022-02-01',
        '2022-03-01',
        '2022-04-01',
        '2022-07-01',
        '2022-08-01',
        '2022-09-01' 
    ],
    'payment_status': [0, 0, 2, 0, 2, 0, 0, 0, 0, 2, 0, 0, 2]
})

y = pd.DataFrame({
    'customer_id': [1, 1, 3, 3, 3, 4],
    'trigger_date': [
        '2019-08-21',
        '2021-09-17',
        '2020-01-01',
        '2022-01-03',
        '2022-03-08',
        '2022-10-01']})

# DESIRED RESULT IS z:
z = x.__deepcopy__()
z['nearest_trigger_date'] = [
    '2019-09-17',
    '2019-09-17',
    '2019-09-17',
    np.nan,
    np.nan,
    '2022-01-03',
    '2022-01-03',
    '2022-01-03',
    '2022-03-08',
    '2022-03-08',
    '2022-10-01',
    '2022-10-01',
    '2022-10-01' ]

x['reporting_date'] = x['reporting_date'].apply(pd.to_datetime)
y['trigger_date'] = y['trigger_date'].apply(pd.to_datetime)
z['nearest_trigger_date'] = z['nearest_trigger_date'].apply(pd.to_datetime)

EDIT This is not a duplicate of Merging series of pandas dataframe into single dataframe. That is a concatenation, not a "merge" or "join". And pd.merge_asof only supports merging on one column.

Derek Fulton
  • 306
  • 1
  • 14

0 Answers0