4

This question is similar but not identical to Add multiple columns to R data.table in one function call?

Let's say I have a data.table

ex<-data.table(AAA=runif(100000),BBBB=runif(100000),CCC=runif(100000),DDD=runif(100000),EEE=runif(100000),FFF=runif(100000),HHH=runif(100000),III=runif(100000),FLAG=c(rep(c("a","b","c","d","e"),200000)))

I can get the sum and mean of all the columns by doing

ex[,c(sum=lapply(.SD,sum),mean=lapply(.SD,mean)),by=FLAG]

The results look good with the names I specified in the J appended to the existing column names for easy identification with only 1 row for each of the values of FLAG, as expected.

However, let's say I have a function that returns a list such as

sk<-function(x){
  meanx<-mean(x)
  lenx<-length(x)
  difxmean<-x-meanx
  m4<-sum((difxmean)^4)/lenx
  m3<-sum((difxmean)^3)/lenx
  m2<-sum((difxmean)^2)/lenx
  list(mean=meanx,len=lenx,sd=m2^.5,skew=m3/m2^(3/2),kurt=(m4/m2^2)-3)
}

If I do

ex[,lapply(.SD,sk),by=FLAG]

I get results with a row for each output of the list. I'd like to still have just 1 row of results with columns for each of the original columns and function results.

For example the output columns should be

AAA.mean    AAA.len     AAA.sd     AAA.skew    AAA.kurt       BBBB.mean    BBBB.len     BBBB.sd     BBBB.skew    BBBB.kurt    ....    III.mean    III.len     III.sd     III.skew    III.kurt

Is there a way to do this?

I know I could just put all these individual functions in the J and get the columns but I find that when I use this function instead of the individual functions for all the moments it is a good bit faster.

x<-runif(10000000)
system.time({
mean(x)
length(x)
sd(x)
skewness(x)
kurtosis(x)
})
user  system elapsed 
5.84    0.47    6.30

system.time(sk(x))
user  system elapsed 
3.9     0.1     4.0 
Community
  • 1
  • 1
Dean MacGregor
  • 11,847
  • 9
  • 34
  • 72

1 Answers1

5

Try this:

ex[, as.list(unlist(lapply(.SD, sk))), by = FLAG]
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • +1. I wonder if there is a way to avoid coercing to list ([as recommended](http://rwiki.sciviews.org/doku.php?id=packages:cran:data.table#don_t_coerce_j_to_list_use_list_directly)). `do.call("c",...` and `Reduce("c",...` seem to be just as slow. – Frank Jun 01 '13 at 15:33
  • @Frank, `do.call("c", ...)` seems ok but `Reduce("c", ...)` has down side of losing an important part of the names. – G. Grothendieck Jun 01 '13 at 15:49
  • This does work but based on @Frank's comments I wonder if there's a way to change the way the function returns results in order to improve this. – Dean MacGregor Jun 03 '13 at 15:38