Question
How can one apply a function on one or more variable per unique values of another variable? Something like
dt[,DoStuff(x) ,y]
Example
Consider the mpg
data set from ggplot2
require(data.table)
require(ggplot2)
as.data.table(mpg)
manufacturer model displ year cyl trans drv cty hwy fl class
1: audi a4 1.8 1999 4 auto(l5) f 18 29 p compact
2: audi a4 1.8 1999 4 manual(m5) f 21 29 p compact
3: audi a4 2.0 2008 4 manual(m6) f 20 31 p compact
4: audi a4 2.0 2008 4 auto(av) f 21 30 p compact
5: audi a4 2.8 1999 6 auto(l5) f 16 26 p compact
---
230: volkswagen passat 2.0 2008 4 auto(s6) f 19 28 p midsize
231: volkswagen passat 2.0 2008 4 manual(m6) f 21 29 p midsize
232: volkswagen passat 2.8 1999 6 auto(l5) f 16 26 p midsize
233: volkswagen passat 2.8 1999 6 manual(m5) f 18 26 p midsize
234: volkswagen passat 3.6 2008 6 auto(s6) f 17 26 p midsize
I would like to paste together the unique manufacturer
names (separated by underscore) for each unique value of fl
. I tried
as.data.table(mpg)[,list(x = function(manufacturer) {paste(unique(manufacturer), collapse="_")} ),fl]
Error in `[.data.table`(as.data.table(mpg), , list(x = function(manufacturer) { :
All items in j=list(...) should be atomic vectors or lists. If you are trying something like j=list(.SD,newcol=mean(colA)) then use := by group instead (much quicker), or cbind or merge afterwards.
An alternative solution is
sapply(unique(mpg$fl), FUN=function(x){paste(unique(mpg$manufacturer[mpg$fl==x]),collapse="_")})