11

I have a list of xts objects that are mutually exclusive days. I would like to merge the list into one large xts object. My attempt at doing this was to"

merged_reg_1_min_prices <- do.call(cbind, reg_1_min_prices)

However this seems to run out of memory. reg_1_min_prices is 6,000 days of 1 minute returns on mutually exclusive days so it's not very large. Does anyone know how to get around this?

To be clear: reg_1_min_prices contains mutually exclusive days with 1 minute prices on each day and each entry in the list is an xts object.

Alex
  • 19,533
  • 37
  • 126
  • 195
  • Is `reg_1_min_prices` a list of 6,000 xts objects, where each list element is 1-minute data for a single day? – Joshua Ulrich Aug 19 '12 at 18:12
  • yes. apologies, let me add that to the question – Alex Aug 19 '12 at 18:14
  • Are you sure your want `cbind` instead of `rbind`? If you want to `rbind` the data, [this function](https://r-forge.r-project.org/scm/viewvc.php/pkg/qmao/R/do.call.rbind.R?view=markup&root=twsinstrument) may help – GSee Aug 19 '12 at 18:15
  • i tried `merge`, `rbind` and `cbind`. all of them fail unfortunately and run up the memory usage from aourn 4 gigs to about 60 gigs. – Alex Aug 19 '12 at 18:15
  • 1
    The function I referenced above is used in `FinancialInstrument:::getSymbols.FI` which works well to load and merge lots of days of high frequency data. – GSee Aug 19 '12 at 18:17
  • interesting: i did the same thing just in a `for` loop because i thought it would use less memory but it still increased rapidly. i'll give your function a shot. this might be a more systemic problem though since my data is pretty small? – Alex Aug 19 '12 at 18:28

3 Answers3

12

I use the strategy provided by Dominik in his answer to this question

I have turned it into a function in my qmao package. This code is also at the core of getSymbols.FI in the FinancialInstrument package.

do.call.rbind <- function(lst) {
  while(length(lst) > 1) {
    idxlst <- seq(from=1, to=length(lst), by=2)
    lst <- lapply(idxlst, function(i) {
      if(i==length(lst)) { return(lst[[i]]) }
      return(rbind(lst[[i]], lst[[i+1]]))
    })
  }
  lst[[1]]
}

If you want to rbind data.frames, @JoshuaUlrich has provided an elegant solution here


As far as I can tell (without looking very closely) memory is not an issue with any of the three solutions offered (@JoshuaUlrich's, @Alex's, and qmao::do.call.rbind). So, it comes down to speed...

library(xts)
l <- lapply(Sys.Date()-6000:1, function(x) {
    N=60*8;xts(rnorm(N),as.POSIXct(x)-seq(N*60,1,-60))})
GS <- do.call.rbind
JU <- function(x) Reduce(rbind, x)
Alex <- function(x) do.call(rbind, lapply(x, as.data.frame)) #returns data.frame, not xts

identical(GS(l), JU(l)) #TRUE

library(rbenchmark)
benchmark(GS(l), JU(l), Alex(l), replications=1)
     test replications elapsed relative user.self sys.self user.child sys.child
3 Alex(l)            1  89.575 109.9080    56.584   33.044          0         0
1   GS(l)            1   0.815   1.0000     0.599    0.216          0         0
2   JU(l)            1 209.783 257.4025   143.353   66.555          0         0

do.call.rbind clearly wins on speed.

Community
  • 1
  • 1
GSee
  • 48,880
  • 13
  • 125
  • 145
10

You don't want to use merge because that would return a 6000-column object with a row for each row in each list element (2,880,000 in my example). And most of the values will be NA. cbind.xts simply calls merge.xts with a few default argument values, so you don't want to use that either.

We're aware of the memory problem caused by calling rbind.xts via do.call. Jeff does have more efficient code, but it's a prototype that is not public.

An alternative to @GSee's solution is to use Reduce. This takes awhile to run on my laptop, but memory is not an issue even with only 4GB.

library(xts)
l <- lapply(Sys.Date()-6000:1, function(x) {
  N=60*8;xts(rnorm(N),as.POSIXct(x)-seq(N*60,1,-60))})
x <- Reduce(rbind, l)
Community
  • 1
  • 1
Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
1

Here is how to do this efficiently: convert each xts object to a data.frame and simply rbind them. This does not raise the memory usage almost at all. If necessary then simply create a new xts object from the data.frame

Alex
  • 19,533
  • 37
  • 126
  • 195
  • I don't think `do.call.rbind` uses much memory either, and it's magnitudes faster for the data I've tested (not even including the time to convert back to `xts`). – GSee Aug 19 '12 at 19:13
  • ok cool, let me give it a shot. do you happen to know what it is about the normal rbind that screws it all up? – Alex Aug 19 '12 at 19:18
  • No, but I bet @JoshuaUlrich does. If I recall correctly, Jeff has some uncommitted code to do this much more efficiently. – GSee Aug 19 '12 at 19:21