I asked a question earlier, about merging dataframes on nearest datetime and by ID and was suggested merge_asof, but i get NaN values in columns. Example df's below:
df1
ID1 active_at
0 111 2020-01-01 00:50:00.000
1 222 2020-01-05 05:00:00.000
2 333 2020-01-10 10:00:00.000
df2
ID2 timeObserved Value
0 111 2020-01-01 00:55:00 1
1 111 2020-01-05 01:00:00 2
2 222 2020-01-10 05:50:00 3
3 222 2020-01-15 04:55:00 4
4 333 2020-01-20 10:10:00 5
5 333 2020-01-20 11:00:00 6
I wanted to get a result similar to:
df3
ID1 active_at Value
0 111 2020-01-01 00:50:00.000 1
1 222 2020-01-05 05:00:00.000 4
2 333 2020-01-10 10:00:00.000 5
or
df3
ID1 active_at Value ID2 timeObserved
0 111 2020-01-01 00:50:00.000 1 111 2020-01-01 00:55:00
1 222 2020-01-05 05:00:00.000 4 222 2020-01-15 04:55:00
2 333 2020-01-10 10:00:00.000 5 333 2020-01-20 10:10:00
I used this to merge them:
df3 = pd.merge_asof(df1,df2, left_on='active_at', right_on='timeObserved', left_by='ID1',
right_by='ID2', direction='nearest')
But i am getting this result:
df1
ID1 active_at Value ID2 timeObserved
0 111 2020-01-01 00:50:00.000 NaN NaN NaT
1 222 2020-01-05 05:00:00.000 NaN Nan Nat
2 333 2020-01-10 10:00:00.000 5 333 2020-01-20 10:10:00
On the real data, i get 2 out of 50 merged by ID's correctly. I am sure there is matching ID's, which is why i am wondering.
I have tried: Setting the tolerance=pd.Timedelta('2d')to multiple days. Double checked that the dType is correct before merging. dateTime64[ns] on 'active_at' and 'timeObserved' , and int on 'value' and 'ID'. Made sure there is matching ID's.
When i use the df3.info(), i notice that the dType of 'Value' and 'ID2' has changed to float64. I cannot figure out why, is it because of the NaN values or some mistake i have made?
There is 2.732 rows in df1 and 40.500 in df2.