Shortcut to use Column in data.table without typing their names

Question

I converted iris data set to data.table format.My aim was to take mean of all columns and group them by Species using data.table.

DT <-as.data.table(iris)

Below is my desired output

DT[,.(mean(Sepal.Length),mean(Sepal.Width),mean(Petal.Length),mean(Petal.Width)),by =.(Species)]
      Species    V1    V2    V3    V4
1:     setosa 5.006 3.428 1.462 0.246
2: versicolor 5.936 2.770 4.260 1.326
3:  virginica 6.588 2.974 5.552 2.026

But typing all the column names using the above manner when we have large number of column is time confusing. I tried this using the below command.But the output is arranged in a slightly different manner

DT[, .(vapply(DT[,!'Species',with=FALSE],mean,FUN.VALUE =1)),by = .(Species)]
               Species       V1
         1:     setosa 5.843333
         2:     setosa 3.057333
         3:     setosa 3.758000
         4:     setosa 1.199333
         5: versicolor 5.843333
         6: versicolor 3.057333
         7: versicolor 3.758000
         8: versicolor 1.199333
         9:  virginica 5.843333
        10:  virginica 3.057333
        11:  virginica 3.758000
        12:  virginica 1.199333

is there any way to use above kind of things and get rid of typing all the column names just to take out the mean and group them by species using data.table

Please Do not suggest to use 'with = FALSE' . I already knew that.

Think this might be a duplicate, but `DT[, lapply(.SD,mean), by=Species, .SDcols=names(DT)[1:4]]` or similar should sort you out. — thelatemail, Jun 16 '16 at 10:54
Possibly a duplicate of http://stackoverflow.com/questions/16783598/r-datatable-apply-a-function-to-a-subset-of-columns until I can find an exact match. — thelatemail, Jun 16 '16 at 10:56
@learner what do you actually want if `.SDcols` is not helpful? — akrun, Jun 16 '16 at 11:17
@learner - it gives exactly the same output as your desired output doesn't it? In fact it can be slightly simplified to `DT[, lapply(.SD,mean), by=Species]` if you just want to exclude the `by=` variable. — thelatemail, Jun 16 '16 at 11:19
Have you gone through the [Introduction to data.table](https://github.com/Rdatatable/data.table/wiki/Getting-started) vignette? Specifically section 2e. — Arun, Jun 16 '16 at 11:28
@learner feed `.SDcols` the vector of column names using negation. Here is one method: `names(iris)[names(iris) != "Species"]`. I'm pretty sure there are cleverer ways. — lmo, Jun 16 '16 at 11:57
To expand this subsetting, just add names, say you don't want to calculate the mean of Sepal.Length as well, then feed `.SDcols` `names(iris)[!(names(iris) %in% c("Species", "Sepal.Length"))]`. — lmo, Jun 16 '16 at 12:04
@learner you can self answer the question to not leave it unresolved. — jangorecki, Jun 17 '16 at 11:32

score 0 · Answer 1 · answered Jul 31 '17 at 14:59

iris <- as.data.table(iris)
iris <- iris[,lapply(.SD,mean), by=Species]

Desired Output:-

structure(list(Species = structure(1:3, .Label = c("setosa", 
"versicolor", "virginica"), class = "factor"), Sepal.Length = c(5.006, 
5.936, 6.588), Sepal.Width = c(3.428, 2.77, 2.974), Petal.Length = c(1.462, 
4.26, 5.552), Petal.Width = c(0.246, 1.326, 2.026)), row.names = c(NA, 
-3L), class = c("data.table", "data.frame"), .Names = c("Species", 
"Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"), 
.internal.selfref = <pointer: 0x0000000003fd0788>)

Suppose you want mean of only "Sepal.Length" and "Sepal.Width" columns. You can do that by slight modification as follows:-

iris <- iris[,lapply(.SD,mean), by=Species, .SDcols = c("Sepal.Length", "Sepal.Width")]

In data.table, DT[i,j,by], j must return a list and lapply returns list

Shortcut to use Column in data.table without typing their names

1 Answers1