1

I am struggling with a problem on how to subset xts objects stored within a list. The subsetting shall happen based on row indexes. The background is that I want to split the objects randomly 80/20 into training and test set. Here is an example:

library(xts)

# Create a sample list with dummy data
series <- list(
  A=xts(rnorm(n=200), as.Date("2015-01-01")+1:200),
  B=xts(rnorm(n=50), as.Date("2015-04-01")+1:50)
)

Note: the length of these xts objects differ on purpose.

The trainIndex is a list that contains row numbers that split each xts object on 80/20 basis as per the createDataPartitionfunction from the caret package:

# create am index of row numbers for splitting the dataset
library(caret)
trainIndex <- lapply(series, function(x) {createDataPartition(x, p=0.8)})

And this is what I was expecting to work:

series.test <- lapply(series, function(x) x[trainIndex,])

which it didn't.

This works for a 'static' vector (as per here):

trainIndex.simple <- seq(1,50,by=3)
lapply(series, function(x) x[trainIndex.simple,])

And this works on one list element

series$A[trainIndex$A[[1]],]

But how to apply the list of row indices on a list of xts objects? This post might be helpful somehow, but I couldn't translate it to my problem...

Any hint is very much appreciated!

Community
  • 1
  • 1
Stephan
  • 153
  • 1
  • 7

1 Answers1

1

You need to use a function that loops over both lists at the same time. For example: mapply, or Map (which calls mapply):

set.seed(21)
trainIndex <- lapply(series, function(x)
  sample(c(TRUE,FALSE), nrow(x), TRUE, c(0.8, 0.2)))
series.test <- mapply(function(x, i) x[i,], x=series, i=trainIndex)
series.test <- Map(function(x, i) x[i,], x=series, i=trainIndex)
Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
  • Thanks @'Joshua Ulrich', this did solve my problem. I realize there is a difference what type the subsetting element is: `createDataPartition` creates a numeric vector with row indices, while your solution creates a logical vector (TRUE/FALSE) - that seems to be important. – Stephan Oct 16 '15 at 06:51
  • @Stephan: whether `trainIndex` is numeric or logical should not matter. You can subset xts objects by either. You could transform my `trainIndex` via `lapply(trainIndex, which)` and you will obtain the same result when you use it to create `series.test`. – Joshua Ulrich Oct 16 '15 at 08:30