1

I want to combine 2 data-frames together using the following databases (1) & (2). Desired result is shown in (3).

1)

Case  Date   
123   2019-12-21
123   2019-12-16
234   2019-12-21
345   2019-12-21

2)

Case   Date   Value

123 2019-12-20  0.4
123 2019-12-18  0.5
123 2019-12-14  1.2

3) Final

Case  Date_X     Date_Y      Value 
123 2019-12-21 2019-12-20   0.4
123 2019-12-16 2019-12-14   1.2
234 2019-12-21
345 2019-12-21

Below are the criteria that I am looking for:

1.) match based on 'case'.

2.) 'Date_Y' must be =< 'Date_X' and it must be the maximum date in the data-frame of 'case'.

3.) Display value that is corresponding to Date_Y.

I tried looking for similar codes but I could not find it.

Thank you in advance for helping.

Strange
  • 1,460
  • 1
  • 7
  • 18
Looney
  • 21
  • 4

1 Answers1

0

Here's what you want:

import numpy as np
import pandas as pd


df1 = pd.DataFrame({'case':[123,123,345,456],'date':['2019-12-21','2019-12-17','2019-12-21','2019-12-21']})
df2 = pd.DataFrame({'case':[123,123,123],'date':['2019-12-21','2019-12-18','2019-12-15'],'value':[0.4,0.5,1.2]})

df1['date'] = pd.to_datetime(df1['date'])
df2['date'] = pd.to_datetime(df2['date'])

datey = []
val = []
for i in range(len(df1)):
#checking case
    tmp=df2[df2['case']==df1['case'][i]]
#comparing date
    tmp = tmp[tmp['date']<=df1['date'][i]]
    if(len(tmp)>0):
        tmp = tmp.loc[tmp['date'].idxmax()]
#appending to lists
        datey.append(tmp.date)
        val.append(tmp.value)
    else:
        datey.append(None)
        val.append(None)

df1['date_y'] = datey
df1['value'] = val

print(df1)

Output:

   case       date     date_y  value
0   123 2019-12-21 2019-12-21    0.4
1   123 2019-12-17 2019-12-15    1.2
2   345 2019-12-21        NaT    NaN
3   456 2019-12-21        NaT    NaN

You can remove Nan values..

Strange
  • 1,460
  • 1
  • 7
  • 18