merging a large list of xts objects

Question

I have a list of xts objects that are mutually exclusive days. I would like to merge the list into one large xts object. My attempt at doing this was to"

merged_reg_1_min_prices <- do.call(cbind, reg_1_min_prices)

However this seems to run out of memory. reg_1_min_prices is 6,000 days of 1 minute returns on mutually exclusive days so it's not very large. Does anyone know how to get around this?

To be clear: reg_1_min_prices contains mutually exclusive days with 1 minute prices on each day and each entry in the list is an xts object.

Is `reg_1_min_prices` a list of 6,000 xts objects, where each list element is 1-minute data for a single day? — Joshua Ulrich, Aug 19 '12 at 18:12
Are you sure your want `cbind` instead of `rbind`? If you want to `rbind` the data, [this function](https://r-forge.r-project.org/scm/viewvc.php/pkg/qmao/R/do.call.rbind.R?view=markup&root=twsinstrument) may help — GSee, Aug 19 '12 at 18:15
i tried `merge`, `rbind` and `cbind`. all of them fail unfortunately and run up the memory usage from aourn 4 gigs to about 60 gigs. — Alex, Aug 19 '12 at 18:15
The function I referenced above is used in `FinancialInstrument:::getSymbols.FI` which works well to load and merge lots of days of high frequency data. — GSee, Aug 19 '12 at 18:17
interesting: i did the same thing just in a `for` loop because i thought it would use less memory but it still increased rapidly. i'll give your function a shot. this might be a more systemic problem though since my data is pretty small? — Alex, Aug 19 '12 at 18:28

score 12 · Accepted Answer · edited May 23 '17 at 11:54

I use the strategy provided by Dominik in his answer to this question

I have turned it into a function in my qmao package. This code is also at the core of getSymbols.FI in the FinancialInstrument package.

do.call.rbind <- function(lst) {
  while(length(lst) > 1) {
    idxlst <- seq(from=1, to=length(lst), by=2)
    lst <- lapply(idxlst, function(i) {
      if(i==length(lst)) { return(lst[[i]]) }
      return(rbind(lst[[i]], lst[[i+1]]))
    })
  }
  lst[[1]]
}

If you want to rbind data.frames, @JoshuaUlrich has provided an elegant solution here

As far as I can tell (without looking very closely) memory is not an issue with any of the three solutions offered (@JoshuaUlrich's, @Alex's, and qmao::do.call.rbind). So, it comes down to speed...

library(xts)
l <- lapply(Sys.Date()-6000:1, function(x) {
    N=60*8;xts(rnorm(N),as.POSIXct(x)-seq(N*60,1,-60))})
GS <- do.call.rbind
JU <- function(x) Reduce(rbind, x)
Alex <- function(x) do.call(rbind, lapply(x, as.data.frame)) #returns data.frame, not xts

identical(GS(l), JU(l)) #TRUE

library(rbenchmark)
benchmark(GS(l), JU(l), Alex(l), replications=1)
     test replications elapsed relative user.self sys.self user.child sys.child
3 Alex(l)            1  89.575 109.9080    56.584   33.044          0         0
1   GS(l)            1   0.815   1.0000     0.599    0.216          0         0
2   JU(l)            1 209.783 257.4025   143.353   66.555          0         0

do.call.rbind clearly wins on speed.

score 10 · Answer 2 · edited May 23 '17 at 11:54

10

You don't want to use merge because that would return a 6000-column object with a row for each row in each list element (2,880,000 in my example). And most of the values will be NA. cbind.xts simply calls merge.xts with a few default argument values, so you don't want to use that either.

We're aware of the memory problem caused by calling rbind.xts via do.call. Jeff does have more efficient code, but it's a prototype that is not public.

An alternative to @GSee's solution is to use Reduce. This takes awhile to run on my laptop, but memory is not an issue even with only 4GB.

library(xts)
l <- lapply(Sys.Date()-6000:1, function(x) {
  N=60*8;xts(rnorm(N),as.POSIXct(x)-seq(N*60,1,-60))})
x <- Reduce(rbind, l)

edited May 23 '17 at 11:54

Community

1
1

answered Aug 19 '12 at 20:18

Joshua Ulrich

173,410
32
338
418

great! what is Reduce? i can't seem to find an R ref to it – Alex Aug 19 '12 at 20:23
@Alex: it's a function in the base package. See `?Reduce`. – Joshua Ulrich Aug 19 '12 at 20:29
+1 for explaining why `cbind` wasn't what the OP was looking for. – GSee Aug 19 '12 at 20:47

score 1 · Answer 3 · answered Aug 19 '12 at 18:42

1

Here is how to do this efficiently: convert each xts object to a data.frame and simply rbind them. This does not raise the memory usage almost at all. If necessary then simply create a new xts object from the data.frame

answered Aug 19 '12 at 18:42

Alex

19,533
37
126
195

I don't think `do.call.rbind` uses much memory either, and it's magnitudes faster for the data I've tested (not even including the time to convert back to `xts`). – GSee Aug 19 '12 at 19:13
ok cool, let me give it a shot. do you happen to know what it is about the normal rbind that screws it all up? – Alex Aug 19 '12 at 19:18
No, but I bet @JoshuaUlrich does. If I recall correctly, Jeff has some uncommitted code to do this much more efficiently. – GSee Aug 19 '12 at 19:21

merging a large list of xts objects

3 Answers3

Linked