Is there an R equivalent of Python's pandas.merge_asof?

Question

pandas.merge_asof merges two dataframes, does a left join except it matches on the nearest key rather than equal keys.

Example (stolen from the documentation):

>>> quotes
                     time ticker     bid     ask
0 2016-05-25 13:30:00.023   GOOG  720.50  720.93
1 2016-05-25 13:30:00.023   MSFT   51.95   51.96
2 2016-05-25 13:30:00.030   MSFT   51.97   51.98
3 2016-05-25 13:30:00.041   MSFT   51.99   52.00
4 2016-05-25 13:30:00.048   GOOG  720.50  720.93
5 2016-05-25 13:30:00.049   AAPL   97.99   98.01
6 2016-05-25 13:30:00.072   GOOG  720.50  720.88
7 2016-05-25 13:30:00.075   MSFT   52.01   52.03

>>> trades
                     time ticker   price  quantity
0 2016-05-25 13:30:00.023   MSFT   51.95        75
1 2016-05-25 13:30:00.038   MSFT   51.95       155
2 2016-05-25 13:30:00.048   GOOG  720.77       100
3 2016-05-25 13:30:00.048   GOOG  720.92       100
4 2016-05-25 13:30:00.048   AAPL   98.00       100

>>> pd.merge_asof(trades, quotes,
...                       on='time',
...                       by='ticker')
                     time ticker   price  quantity     bid     ask
0 2016-05-25 13:30:00.023   MSFT   51.95        75   51.95   51.96
1 2016-05-25 13:30:00.038   MSFT   51.95       155   51.97   51.98
2 2016-05-25 13:30:00.048   GOOG  720.77       100  720.50  720.93
3 2016-05-25 13:30:00.048   GOOG  720.92       100  720.50  720.93
4 2016-05-25 13:30:00.048   AAPL   98.00       100     NaN     NaN

In the above example, pd.merge_asof matches each row of trades with the row of quotes having the same ticker and the nearest time.

I find this operation quite indispensible in my workflow, and I have been racking my brains trying to think how to accomplish this in R. Of course I could just do the operation in python and read the data frame back in R, but part of my motivation is to learn R.

Please check out https://stackoverflow.com/questions/6709151/how-do-i-combine-two-data-frames-based-on-two-columns — programandoconro, Oct 24 '19 at 09:36
And also [stackoverflow.com/questions/49156597](https://stackoverflow.com/questions/49156597/merge-data-based-on-nearest-date-r) for join on nearest date. — meriops, Oct 24 '19 at 09:58

score 2 · Accepted Answer · answered Oct 24 '19 at 09:44

You can use the data.table package to do a rolling join:

trades[quotes, on=.(ticker, time), roll=-Inf, c("bid","ask") := .(bid, ask)]

output:

                  time ticker  price quantity    bid    ask
1: 2016-05-25 13:30:00   MSFT  51.95       75  51.95  51.96
2: 2016-05-25 13:30:00   MSFT  51.95      155  51.97  51.98
3: 2016-05-25 13:30:00   GOOG 720.77      100 720.50 720.93
4: 2016-05-25 13:30:00   GOOG 720.92      100 720.50 720.93
5: 2016-05-25 13:30:00   AAPL  98.00      100     NA     NA

data:

library(data.table)

quotes <- fread("time ticker     bid     ask
2016-05-25_13:30:00.023   GOOG  720.50  720.93
2016-05-25_13:30:00.023   MSFT   51.95   51.96
2016-05-25_13:30:00.030   MSFT   51.97   51.98
2016-05-25_13:30:00.041   MSFT   51.99   52.00
2016-05-25_13:30:00.048   GOOG  720.50  720.93
2016-05-25_13:30:00.049   AAPL   97.99   98.01
2016-05-25_13:30:00.072   GOOG  720.50  720.88
2016-05-25_13:30:00.075   MSFT   52.01   52.03")

trades <- fread("time ticker   price  quantity
2016-05-25_13:30:00.023   MSFT   51.95        75
2016-05-25_13:30:00.038   MSFT   51.95       155
2016-05-25_13:30:00.048   GOOG  720.77       100
2016-05-25_13:30:00.048   GOOG  720.92       100
2016-05-25_13:30:00.048   AAPL   98.00       100")

quotes[, time := as.POSIXct(time, format="%Y-%m-%d_%H:%M:%OS")]  
trades[, time := as.POSIXct(time, format="%Y-%m-%d_%H:%M:%OS")]

BTW, would you know of something using just data.frame, that is without requiring external packages? — Alex Smart, Oct 24 '19 at 11:29

score 1 · Answer 2 · answered Oct 24 '19 at 13:43

Complex joins can be done using SQL (where the test input is shown reproducibly in the Note at the end). One of the advantages of this approach is that it is quite clear what criteria are being used from the SQL statement.

Assuming you want to join on ticker and time difference less than .002

library(sqldf)

sqldf("select t.*, q.bid, q.ask
  from trades t
  left join quotes q on t.ticker = q.ticker and abs(q.time - t.time) < .002")

giving:

                 time ticker  price quantity    bid    ask
1 2016-05-25 13:30:00   MSFT  51.95       75  51.95  51.96
2 2016-05-25 13:30:00   MSFT  51.95      155     NA     NA
3 2016-05-25 13:30:00   GOOG 720.77      100 720.50 720.93
4 2016-05-25 13:30:00   GOOG 720.92      100 720.50 720.93
5 2016-05-25 13:30:00   AAPL  98.00      100  97.99  98.01

or to join on ticker and minimum time difference:

sqldf("select t.*, q.bid, q.ask, min(abs(q.time - t.time))
  from trades t
  left join quotes q on t.ticker = q.ticker
  group by t.rowid")[1:6]

giving:

                 time ticker  price quantity    bid    ask
1 2016-05-25 13:30:00   MSFT  51.95       75  51.95  51.96
2 2016-05-25 13:30:00   MSFT  51.95      155  51.99  52.00
3 2016-05-25 13:30:00   GOOG 720.77      100 720.50 720.93
4 2016-05-25 13:30:00   GOOG 720.92      100 720.50 720.93
5 2016-05-25 13:30:00   AAPL  98.00      100  97.99  98.01

or to join on minimum difference within time difference of 0.002

sqldf("select t.*, q.bid, q.ask, min(abs(q.time - t.time))
  from trades t
  left join quotes q on t.ticker = q.ticker and abs(q.time - t.time) < 0.002
  group by t.rowid")[1:6]

giving:

                 time ticker  price quantity    bid    ask
1 2016-05-25 13:30:00   MSFT  51.95       75  51.95  51.96
2 2016-05-25 13:30:00   MSFT  51.95      155     NA     NA
3 2016-05-25 13:30:00   GOOG 720.77      100 720.50 720.93
4 2016-05-25 13:30:00   GOOG 720.92      100 720.50 720.93
5 2016-05-25 13:30:00   AAPL  98.00      100  97.99  98.01

Note

Lines1 <- "
                     time ticker     bid     ask
0 2016-05-25T13:30:00.023   GOOG  720.50  720.93
1 2016-05-25T13:30:00.023   MSFT   51.95   51.96
2 2016-05-25T13:30:00.030   MSFT   51.97   51.98
3 2016-05-25T13:30:00.041   MSFT   51.99   52.00
4 2016-05-25T13:30:00.048   GOOG  720.50  720.93
5 2016-05-25T13:30:00.049   AAPL   97.99   98.01
6 2016-05-25T13:30:00.072   GOOG  720.50  720.88
7 2016-05-25T13:30:00.075   MSFT   52.01   52.03"
quotes <- read.table(text = Lines1, as.is = TRUE)
quotes <- transform(quotes, time = as.POSIXct(sub("T", " ", time)))

Lines2 <- "
                     time ticker   price  quantity
0 2016-05-25T13:30:00.023   MSFT   51.95        75
1 2016-05-25T13:30:00.038   MSFT   51.95       155
2 2016-05-25T13:30:00.048   GOOG  720.77       100
3 2016-05-25T13:30:00.048   GOOG  720.92       100
4 2016-05-25T13:30:00.048   AAPL   98.00       100"
trades <- read.table(text = Lines2, as.is = TRUE)
trades <- transform(trades, time = as.POSIXct(sub("T", " ", time)))

score 1 · Answer 3 · edited Aug 03 '21 at 22:28

You can also use the data.table package to perform a non-equi join:

quotes[trades, on=.(ticker, time<=time), .(time=i.time, ticker, price, quantity,  bid, ask), mult='last']

This gives more control and is easier to tweak upon other matching criteria. And the result is the same.

                      time ticker  price quantity    bid    ask
1: 2016-05-25 13:30:00.023   MSFT  51.95       75  51.95  51.96
2: 2016-05-25 13:30:00.038   MSFT  51.95      155  51.97  51.98
3: 2016-05-25 13:30:00.048   GOOG 720.77      100 720.50 720.93
4: 2016-05-25 13:30:00.048   GOOG 720.92      100 720.50 720.93
5: 2016-05-25 13:30:00.048   AAPL  98.00      100     NA     NA

score 0 · Answer 4 · answered Mar 11 '20 at 19:17

0

The fuzzyjoin package has exactly this functionality (joining based on criteria). For example: How can I match fuzzy match strings from two datasets?

answered Mar 11 '20 at 19:17

Arthur Yip

5,810
2
31
50

score -1 · Answer 5 · answered Oct 24 '19 at 09:41

-1

You can use the merge function to join two data frames in R

merge(trades,quotes,by="ticker",all=TRUE)

answered Oct 24 '19 at 09:41

Ashish

337
3
11

This definitely does not work. I want to match on nearest time, not the exact time. – Alex Smart Oct 24 '19 at 11:24

Is there an R equivalent of Python's pandas.merge_asof?

5 Answers5

Note