I want to rbind
a bunch of xts objects, which should not overlap, but if they do overlap I don't want it to add a row twice: choose from one or the other. (I currently do duplicated(index(x))
, then delete them.)
(Example code showing the problem, and the desired output, is below).
Poking around, I found the C source has a dup
parameter; it defaults to FALSE
and when I set it to TRUE
I get exactly the behaviour I wanted:
.External("rbindXts", dup = T, x,y,z, PACKAGE = "xts")
Is there a good reason this wasn't exposed in the rbind()
interface? (By "good" reason I mean along the lines of it is known to be buggy, or really bad performance on large data, or something like that.) Or a more practical reason, such as no-one has had time to write tests and documentation for it yet?
UPDATE:
I went back to my code, and found I wasn't actually using rbind.xts
, but instead the do.call.rbind()
function described here: https://stackoverflow.com/a/9729804/841830 due to a memory issue in rbind.xts.
(I also found my own (!) question from three years ago, which describes how I delete duplicates: How to remove a row from zoo/xts object, given a timestamp )
UPDATE #2:
do.call.rbind
can be modified to use the External
call:
do.call.rbind.xts.no_dup <- function(lst) {
while(length(lst) > 1) {
idxlst <- seq(from=1, to=length(lst), by=2)
lst <- lapply(idxlst, function(i) {
if(i==length(lst)) { return(lst[[i]]) }
return(.External("rbindXts", dup = T, lst[[i]], lst[[i+1]], PACKAGE = "xts"))
})
}
lst[[1]]
}
I tested this with the same test data shown here: https://stackoverflow.com/a/12029366/841830, and it has the same performance, and produces the same 2.8 million row xts object, as do.call.rbind
(which is good). Of course, that test data has no duplicates, so maybe not a fair test?
x <- xts(1:10, Sys.Date()+1:10)
y <- xts(50:55,Sys.Date() + (-1:-6))
z <- xts(20:25,Sys.Date() + (-2:+3))
rbind(x,y,z)
This gives the following output (with the *** showing the undesired lines)
2015-07-02 55
2015-07-03 54
2015-07-04 53
2015-07-05 52
2015-07-06 51
2015-07-06 20 ***
2015-07-07 50
2015-07-07 21 ***
2015-07-08 22
2015-07-09 1
2015-07-09 23 ***
2015-07-10 2
2015-07-10 24 ***
2015-07-11 3
2015-07-11 25 ***
2015-07-12 4
2015-07-13 5
2015-07-14 6
2015-07-15 7
2015-07-16 8
2015-07-17 9
2015-07-18 10
Whereas .External("rbindXts", dup = T, x,y,z, PACKAGE = "xts")
gives:
2015-07-02 55
2015-07-03 54
2015-07-04 53
2015-07-05 52
2015-07-06 51
2015-07-07 50
2015-07-08 22
2015-07-09 1
2015-07-10 2
2015-07-11 3
2015-07-12 4
2015-07-13 5
2015-07-14 6
2015-07-15 7
2015-07-16 8
2015-07-17 9
2015-07-18 10