2

I want to rbind a bunch of xts objects, which should not overlap, but if they do overlap I don't want it to add a row twice: choose from one or the other. (I currently do duplicated(index(x)), then delete them.)

(Example code showing the problem, and the desired output, is below).

Poking around, I found the C source has a dup parameter; it defaults to FALSE and when I set it to TRUE I get exactly the behaviour I wanted:

.External("rbindXts", dup = T, x,y,z, PACKAGE = "xts")

Is there a good reason this wasn't exposed in the rbind() interface? (By "good" reason I mean along the lines of it is known to be buggy, or really bad performance on large data, or something like that.) Or a more practical reason, such as no-one has had time to write tests and documentation for it yet?

UPDATE: I went back to my code, and found I wasn't actually using rbind.xts, but instead the do.call.rbind() function described here: https://stackoverflow.com/a/9729804/841830 due to a memory issue in rbind.xts.

(I also found my own (!) question from three years ago, which describes how I delete duplicates: How to remove a row from zoo/xts object, given a timestamp )

UPDATE #2:

do.call.rbind can be modified to use the External call:

do.call.rbind.xts.no_dup <- function(lst) {
  while(length(lst) > 1) {
    idxlst <- seq(from=1, to=length(lst), by=2)
    lst <- lapply(idxlst, function(i) {
      if(i==length(lst)) { return(lst[[i]]) }
      return(.External("rbindXts", dup = T, lst[[i]], lst[[i+1]], PACKAGE = "xts"))
    })
  }
  lst[[1]]
}

I tested this with the same test data shown here: https://stackoverflow.com/a/12029366/841830, and it has the same performance, and produces the same 2.8 million row xts object, as do.call.rbind (which is good). Of course, that test data has no duplicates, so maybe not a fair test?


x <- xts(1:10, Sys.Date()+1:10)
y <- xts(50:55,Sys.Date() + (-1:-6))
z <- xts(20:25,Sys.Date() + (-2:+3))
rbind(x,y,z)

This gives the following output (with the *** showing the undesired lines)

2015-07-02   55
2015-07-03   54
2015-07-04   53
2015-07-05   52
2015-07-06   51
2015-07-06   20   ***
2015-07-07   50
2015-07-07   21   ***
2015-07-08   22
2015-07-09    1
2015-07-09   23   ***
2015-07-10    2
2015-07-10   24   ***
2015-07-11    3
2015-07-11   25   ***
2015-07-12    4
2015-07-13    5
2015-07-14    6
2015-07-15    7
2015-07-16    8
2015-07-17    9
2015-07-18   10

Whereas .External("rbindXts", dup = T, x,y,z, PACKAGE = "xts") gives:

2015-07-02   55
2015-07-03   54
2015-07-04   53
2015-07-05   52
2015-07-06   51
2015-07-07   50
2015-07-08   22
2015-07-09    1
2015-07-10    2
2015-07-11    3
2015-07-12    4
2015-07-13    5
2015-07-14    6
2015-07-15    7
2015-07-16    8
2015-07-17    9
2015-07-18   10
Community
  • 1
  • 1
Darren Cook
  • 27,837
  • 13
  • 117
  • 217
  • I'm very interested in a reproducible example that demonstrates the memory issues you encounter with `rbind.xts`. – Joshua Ulrich Jul 08 '15 at 13:46
  • @JoshuaUlrich Use test data in http://stackoverflow.com/a/12029366/841830 and do `do.call(rbind.xts,l)`. (But don't do this - my machine became very unresponsive when I tried it earlier to see if the problem had been fixed since @GSee's benchmarking.) – Darren Cook Jul 08 '15 at 14:13
  • Thanks. I think I have a fix for that. The code is currently protecting each new object in a "recursive" while loop and it doesn't unprotect them until the the loop has finished. That means all intermediate objects are held in memory and unavailable for garbage collection... ouch. – Joshua Ulrich Jul 08 '15 at 15:21

1 Answers1

1

Looking at the commit where it was added, it seems it was experimental. So it's not exposed for practical reasons: it's untested/undocumented.

Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
  • Thanks. I've switched to using it, with no regressions so far. I'll be throwing a lot more data at it later today (if my other coding goes to plan), then maybe I can give a patch to expose it and document it. – Darren Cook Jul 08 '15 at 14:22