I have a similar question to that of the post listed at the following link: pandas merging based on a timestamp which do not match exactly
However I need to do a match of many-to-one while having the functionality of pandas.merge_asof().
I have two dataframes, df1 & df2.
import pandas as pd
import numpy as np
from io import StringIO
dtc = [['CALL_DATE']]
df1 = pd.read_csv(StringIO(u'''
CALL_DATE,customer,status
2017-01-03 14:12:58,70892,P
2017-01-06 20:00:25,70892,P
2017-01-07 09:42:58,70892,X
2017-01-03 13:56:41,70928,N
2017-01-07 15:16:26,70928,C
2017-01-03 15:39:11,71075,U
2017-01-03 15:46:29,71075,N
'''))
df2 = pd.read_csv(StringIO(u'''
CALL_DATE,customer,Note
2017-01-03 14:09:00,70892,Call to return
2017-01-06 19:59:00,70892,Wrong Item shipped
2017-01-07 09:36:00,70892,Survey denied
2017-01-03 13:56:00,70928,TGGT
2017-01-03 13:53:00,70928,Open issue
2017-01-03 13:56:00,70928,No Record of listings
2017-01-07 15:15:00,70928,Need Translator
2017-01-07 15:16:00,70928,rescheduled appointment
2017-01-03 15:39:11,71075,New Contact
2017-01-03 15:46:29,71075,open membership
2017-01-03 15:46:29,71075,recurring delivery scheduled
'''))
df1['CALL_DATE'] = pd.to_datetime(df1['CALL_DATE'], format = '%Y-%m-%d %H:%M:%S')
df2['CALL_DATE'] = pd.to_datetime(df2['CALL_DATE'], format = '%Y-%m-%d %H:%M:%S')
These two data frames need to be merged with the ending results being something similar to what follows:
df3 = pd.read_csv(StringIO(u'''
2017-01-03 14:12:58,70892,P,2017-01-03 14:09:00,Call to return
2017-01-06 20:00:25,70892,P,2017-01-06 19:59:00,Wrong Item shipped
2017-01-07 09:42:58,70892,P,2017-01-07 09:36:00,Survey denied
2017-01-03 13:56:41,70928,N,2017-01-03 13:56:00,TGGT
2017-01-03 13:56:41,70928,N,2017-01-03 13:53:00,Open issue
2017-01-03 13:56:41,70928,N,2017-01-03 13:56:00,70928,No Record of listings
2017-01-07 15:16:26,70928,C,2017-01-07 15:15:00,Need Translator
2017-01-07 15:16:26,70928,C,2017-01-07 15:16:00,rescheduled appointment
2017-01-03 15:39:11,71075,U,2017-01-03 15:39:11,New Contact
2017-01-03 15:46:29,71075,N,2017-01-03 15:46:29,open membership
2017-01-03 15:46:29,71075,N,2017-01-03 15:46:29,recurring delivery schedule
'''))
In the sample data provided the times differences are really small, but there are plenty of cases when the time difference can be several hours almost a whole day. I am trying to match a note with the closest customer entry for that customer. Also df2 entries can come before or after (time-wise) df1 entries.
When I do pandas.merge_asof(), it is just doing a one-to-one merge and I am losing notes that should go with a customer's file.