0

I want to make a merge of two data frames.

One that looks like this

Df1

ID date    A
1  201901  5
2  201902  6

Df2

ID date2   B
1  201812  3
1  201811  2
1  201810  1

And I want a merge with the ID and add the columns with the most recent date2 backwards to date1 e.g

ID date   A  Date 2 B
1  201901 5  201812 3
1  201902 6  201812 3

Any ideas?

niraj
  • 17,498
  • 4
  • 33
  • 48
Emm
  • 1
  • 1

1 Answers1

0

Your logic doesn't match your output, but this should work :

first lets use a function that helps us find the nearest value using numpy, we will iterate over your dataframe column and create a dict.

full credits to Ubuntu for this function :

import numpy as np
def find_nearest(array, value):
    array = np.asarray(array)
    idx = (np.abs(array - value)).argmin()
    return array[idx]




df1['date2'] = df1['date'].apply(lambda x : find_nearest(df2['date2'],x))
print(df1)
   ID    date  A   date2
0   1  201901  5  201812
1   2  201902  6  201812

now we create the date2 column using apply ` in df1 and merge on this.

new_df = pd.merge(df1,df2,on=['date2','ID'],how='left') # change behavior to get your result.
print(new_df)
    ID date    A   date2    B
0   1  201901  5  201812  3.0
1   2  201902  6  201812  NaN
Umar.H
  • 22,559
  • 7
  • 39
  • 74