How to split data in lists and find column with highest mean?

Question

I have a time series of values which looks like this:

Date           Stock1    Stocks2     Stock3     Stock4     Stock5
2014-12-12 0.43049618 0.62012496 0.82292694 0.51887804 0.56065709
2014-12-15 0.69277671 1.00000000 0.98740608 0.77923007 1.00000000
2014-12-16 0.74597271 0.55805289 0.84390294 0.97395234 0.95619083
2014-12-17 0.39953887 0.71545285 0.85846613 0.85124830 0.73209062
2014-12-18 0.51999191 0.50113488 0.69509923 0.68881303 0.66698738
2014-12-19 0.38783599 0.68697817 0.76113802 0.68295281 0.74030056
2014-12-22 0.70420921 0.92787280 0.87447896 0.87722413 0.95003376
2014-12-23 0.57677722 0.71422496 0.00000000 0.81869002 0.92373912
2014-12-24 0.44820196 0.45297937 1.00000000 0.70607749 0.54608327
2014-12-26 0.33693471 0.70917672 1.00000000 0.61128286 0.69813454
2014-12-29 0.47741823 0.71516554 0.86265631 0.76560783 0.62194656
2014-12-30 0.59689325 0.94509918 0.90707156 0.57156757 0.74528902
2014-12-31 0.46160632 0.78835863 0.55488135 0.49777964 0.63122553

.

    > dput(head(efficiency.scores[,c(1,2,3,4,5)], n=15))
structure(c(0.44696179, 0.395227931, 0.477439822, 0.295309508, 
0.712614891, 0.689317114, 0.599395023, 0.610971864, 0.337625508, 
0.529290134, 0.596002106, 0.412324483, 0.244831259, 0.443123542, 
0.484748065, 0.686165972, 0.711764909, 0.604578061, 0.42144923, 
0.669898641, 0.735845192, 0.592157589, 0.81714156, 0.380346873, 
0.684386001, 0.672967504, 0.508142689, 0.244274776, 0.548213564, 
0.417804342, 0.612475603, 0.665148957, 0.756447435, 0.582448567, 
1, 1, 1, 1, 1, 1, 0.71708817, 0.528262036, 0.597354154, 0.886971243, 
0.624771744, 0.498557661, 0.382554107, 0.464373083, 0.425888914, 
0.747806533, 0.788271626, 0.407617084, 0.784747938, 0.466987506, 
0.554976586, 0.621751352, 0.501173993, 0.323827823, 0.659625721, 
0.502665703, 0.626577183, 0.458883576, 0.572507952, 0.388946538, 
0.897384403, 0.784054708, 0.652210478, 0.850226608, 0.514172118, 
0.780114865, 0.710307692, 0.714749488, 0.248817293, 0.576462902, 
0.690210031), class = c("xts", "zoo"), .indexCLASS = "Date", tclass = "Date", .indexTZ = "UTC", tzone = "UTC", index = structure(c(1288828800, 
1288915200, 1289174400, 1289260800, 1289347200, 1289433600, 1289520000, 
1289779200, 1289865600, 1289952000, 1290038400, 1290124800, 1290384000, 
1290470400, 1290556800), tzone = "UTC", tclass = "Date"), .Dim = c(15L, 
5L), .Dimnames = list(NULL, c("Stock1", "Stock10", "Stock100", 
"Stock101", "Stock102")))
>

I first need to split this xts object into n periods

I have tried the following:

n = 10

    list <- split.xts(data, f = "weeks", drop = TRUE, k =n )
    list <- split(data, f = n, drop=TRUE)
    list <- split(data, rep(1:nrow(efficiency.scores), each = n))

the first one returns a number of lists not equal to 10. The last one returns me a list of 1042 items. which is exactly the number of rows of the original data file. it should be 1042 / 10. I would also like to drop the last x values if the last remaining are not equal to n

Lets assume the list issue is over. Narrowing it down to each element in the list...The second thing i need is to calculate the mean of all values in each column and find which column names have a mean that lies between a and b

I have tried the following:

a <- 0.9
b <- 1

#Calculate means of columns
means<- as.data.frame(colMeans(test))

#Find row names with mean values between a and b

n <- means[which( means[,1] > 0.9),]

n <- means[apply(means[, -1], MARGIN = 1, function(x) { x > 0.9}), ]

n <- rownames(which(means[,1] > 0.9))

i get errors all around.

Please edit your post to be relevant to the data you provided (and `dput` preferred). For instance **"... returns 1042 items which is the number of rows I have..."** But you have only provided data on 13 rows... In other words, your example is not [reproducible](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Please make it so. — alexwhitworth, Oct 15 '15 at 23:41

alexwhitworth · Accepted Answer · 2015-10-16T00:00:37.143

1

While waiting on your data, I'll use data(sample_matrix, package= "xts")

You can split an xts object with split as specified. Note that your syntax is somewhat confusing since split(...) (your 2nd method) is the same as your first method split.xts(...) since method dispatch calls split.xts in each case. However, n is not a valid argument to parameter f in split.xts.

I believe the typical preference is to use split and let method dispatch do it's thing.

library(xts)
data(sample_matrix, package= "xts")
x <- sample_matrix
x2 <- split(x, f= "weeks")
# get colmeans for a single xts
colMeans(x) 
a <- 49
b <- 49.2
names(x)[which((colMeans(x) > a & colMeans(x) < b))]

## or for a list
c_means <- lapply(x2, colMeans) 
stks <- lapply(c_means, function(x,a,b) {names(x)[which((x > a & x < b))]}, a= a, b= b)

edit -- your data

library(xts)
x <- structure(...)
x2 <- split(x, f= "weeks")
a <- .2 # for a non-zero result
b <- 1
c_means <- lapply(x2, colMeans) 
stks <- lapply(c_means, function(x,a,b) {names(x)[which((x > a & x < b))]}, a= a, b= b)

edited Oct 16 '15 at 00:00

answered Oct 15 '15 at 23:52

alexwhitworth

4,839
5
32
59

Hi Alex, i would like to have the option of using an exponential moving average instead of a SMA. to do that i would need to split the list in groups larger than 5. How can i do that using the split function? – Alex Bădoi Oct 16 '15 at 13:23
ask a new question and I'll aim to provide a new answer. (I may have some time later today.) – alexwhitworth Oct 16 '15 at 15:46
But add a link in this comment thread so I can easily find the new Q – alexwhitworth Oct 16 '15 at 15:48
http://stackoverflow.com/questions/33179749/how-to-split-an-xts-object-in-multiple-ways – Alex Bădoi Oct 16 '15 at 21:22

How to split data in lists and find column with highest mean?

1 Answers1

edit -- your data