I am trying to merge 2 dataframes, with exact matching on some columns and as_of
matching on some other column (typically a date). The intention is very well described in this post (I'll copy paste the main content below):
Pandas: Approximate join on one column, exact match on other columns
The post above was answered ; only it dates back from 2016, before the introduction of pandas.merge_asof
. I believe there can be an easier answer now that it's been released. Brutal approach would be to merge as_of for each group of rows with the same values for the cols on which I want to merge exactly on. But is there a more elegant version?
Precise description of desired input and outputs:
Inputs
df1 = pd.DataFrame({'index': ['a1','a2','a3','a4'], 'col1': ['1232','432','432','123'], 'col2': ['asd','dsa12','dsa12','asd2'], 'col3': ['1','2','2','3'], 'date': ['2010-01-23','2016-05-20','2010-06-20','2008-10-21'],}).set_index('index')
df1
Out[430]:
col1 col2 col3 date
index
a1 1232 asd 1 2010-01-23
a2 432 dsa12 2 2016-05-20
a3 432 dsa12 2 2010-06-20
a4 123 asd2 3 2008-10-21
df2 = pd.DataFrame({'index': ['b1','b2','b3','b4'], 'col1': ['132','432','432','123'], 'col2': ['asd','dsa12','dsa12','sd2'], 'col3': ['1','2','2','3'], 'date': ['2010-01-23','2016-05-23','2010-06-10','2008-10-21'],}).set_index('index')
df2
Out[434]:
col1 col2 col3 date b_col
index
b1 132 asd 1 2010-01-23 1
b2 432 dsa12 2 2016-05-23 2
b3 432 dsa12 2 2010-06-10 3
b4 123 sd2 3 2008-10-21 4
Outputs:
col1 col2 col3 date b_col
index
a2 432 dsa12 2 2016-05-20 2
a3 432 dsa12 2 2010-06-20 3
NOTE 1: the reason why I need to do this is that I need something like groupby(...)[...].rolling(...).transform(...)
with latency which doesn't seem to exist yet, unless I am missing something?
NOTE 2: I want to avoid computing all couples and then filtering as the dataframe may get too big.