How can I make a merge of two data frames taking The first row that match or the next one

Question

I want to make a merge of two data frames.

One that looks like this

Df1

ID date    A
1  201901  5
2  201902  6

Df2

ID date2   B
1  201812  3
1  201811  2
1  201810  1

And I want a merge with the ID and add the columns with the most recent date2 backwards to date1 e.g

ID date   A  Date 2 B
1  201901 5  201812 3
1  201902 6  201812 3

Any ideas?

and why is ID 1 repeated on your output? it doest match ur logic. — Umar.H, Dec 15 '19 at 20:47

Umar.H · Answer 1 · 2019-12-16T00:10:16.287

Your logic doesn't match your output, but this should work :

first lets use a function that helps us find the nearest value using numpy, we will iterate over your dataframe column and create a dict.

full credits to Ubuntu for this function :

import numpy as np
def find_nearest(array, value):
    array = np.asarray(array)
    idx = (np.abs(array - value)).argmin()
    return array[idx]




df1['date2'] = df1['date'].apply(lambda x : find_nearest(df2['date2'],x))
print(df1)
   ID    date  A   date2
0   1  201901  5  201812
1   2  201902  6  201812

now we create the date2 column using apply ` in df1 and merge on this.

new_df = pd.merge(df1,df2,on=['date2','ID'],how='left') # change behavior to get your result.
print(new_df)
    ID date    A   date2    B
0   1  201901  5  201812  3.0
1   2  201902  6  201812  NaN

How can I make a merge of two data frames taking The first row that match or the next one

1 Answers1