0

I asked a question earlier, about merging dataframes on nearest datetime and by ID and was suggested merge_asof, but i get NaN values in columns. Example df's below:

df1
  ID1       active_at
0 111       2020-01-01 00:50:00.000
1 222       2020-01-05 05:00:00.000
2 333       2020-01-10 10:00:00.000

df2
  ID2       timeObserved          Value
0 111       2020-01-01 00:55:00   1
1 111       2020-01-05 01:00:00   2
2 222       2020-01-10 05:50:00   3
3 222       2020-01-15 04:55:00   4
4 333       2020-01-20 10:10:00   5
5 333       2020-01-20 11:00:00   6

I wanted to get a result similar to:

df3
  ID1       active_at                 Value
0 111       2020-01-01 00:50:00.000   1
1 222       2020-01-05 05:00:00.000   4
2 333       2020-01-10 10:00:00.000   5

or

df3
  ID1       active_at                 Value    ID2    timeObserved
0 111       2020-01-01 00:50:00.000   1        111    2020-01-01 00:55:00
1 222       2020-01-05 05:00:00.000   4        222    2020-01-15 04:55:00
2 333       2020-01-10 10:00:00.000   5        333    2020-01-20 10:10:00

I used this to merge them:

df3 = pd.merge_asof(df1,df2, left_on='active_at', right_on='timeObserved', left_by='ID1', 
         right_by='ID2', direction='nearest')

But i am getting this result:

df1
  ID1       active_at                 Value    ID2    timeObserved
0 111       2020-01-01 00:50:00.000   NaN      NaN    NaT
1 222       2020-01-05 05:00:00.000   NaN      Nan    Nat
2 333       2020-01-10 10:00:00.000   5        333    2020-01-20 10:10:00

On the real data, i get 2 out of 50 merged by ID's correctly. I am sure there is matching ID's, which is why i am wondering.

I have tried: Setting the tolerance=pd.Timedelta('2d')to multiple days. Double checked that the dType is correct before merging. dateTime64[ns] on 'active_at' and 'timeObserved' , and int on 'value' and 'ID'. Made sure there is matching ID's.

When i use the df3.info(), i notice that the dType of 'Value' and 'ID2' has changed to float64. I cannot figure out why, is it because of the NaN values or some mistake i have made?

There is 2.732 rows in df1 and 40.500 in df2.

ChrDahl
  • 49
  • 4

0 Answers0