4

I have 2 dataframes, and would like to insert a row into df1 with a value from df2, based on the nearest datetime and matching ID. Example df's below:

df1
  storeID   datetime
0 111       2020-01-01 00:50:00
1 222       2020-01-05 05:00:00
2 333       2020-01-10 10:00:00

df2
  storeID   datetime              Value
0 111       2020-01-01 00:55:00   1
1 111       2020-01-05 01:00:00   2
2 222       2020-01-10 05:50:00   3
3 222       2020-01-15 04:55:00   4
4 333       2020-01-20 10:10:00   5
5 333       2020-01-20 11:00:00   6

The output should be something like this:

df1
  storeID   datetime              Value
0 111       2020-01-01 00:50:00   1
1 222       2020-01-05 05:00:00   4
2 333       2020-01-10 10:00:00   5

I came across this question, which is quite similar, however, it is not considering that ID's should match. In the real example do df1 have 50.000+ rows and df2 has 200.000+ rows.

Ravi
  • 2,778
  • 2
  • 20
  • 32
ChrDahl
  • 49
  • 4

1 Answers1

1

Merge is not exactly the solution to this. Use merge_asof:

pd.merge_asof(df1,df2,on='datetime', by='storeID', direction='nearest')

Output:

   storeID            datetime  Value
0      111 2020-01-01 00:50:00      1
1      222 2020-01-05 05:00:00      3
2      333 2020-01-10 10:00:00      5
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
  • It merges now, however, in the value column, i get mostly NaN values - maybe 1% i correct. A few merges correctly, but cannot figure out the reason for this, as the values is present in both columns. I tried looking through the documentation, but none of the parameters and tried setting a tolerance, but this did not work. Any suggestions? – ChrDahl Dec 19 '20 at 10:40