1

Question

How can one apply a function on one or more variable per unique values of another variable? Something like

dt[,DoStuff(x) ,y]

Example

Consider the mpg data set from ggplot2

require(data.table)
require(ggplot2)
as.data.table(mpg)
     manufacturer  model displ year cyl      trans drv cty hwy fl   class
  1:         audi     a4   1.8 1999   4   auto(l5)   f  18  29  p compact
  2:         audi     a4   1.8 1999   4 manual(m5)   f  21  29  p compact
  3:         audi     a4   2.0 2008   4 manual(m6)   f  20  31  p compact
  4:         audi     a4   2.0 2008   4   auto(av)   f  21  30  p compact
  5:         audi     a4   2.8 1999   6   auto(l5)   f  16  26  p compact
 ---                                                                     
230:   volkswagen passat   2.0 2008   4   auto(s6)   f  19  28  p midsize
231:   volkswagen passat   2.0 2008   4 manual(m6)   f  21  29  p midsize
232:   volkswagen passat   2.8 1999   6   auto(l5)   f  16  26  p midsize
233:   volkswagen passat   2.8 1999   6 manual(m5)   f  18  26  p midsize
234:   volkswagen passat   3.6 2008   6   auto(s6)   f  17  26  p midsize

I would like to paste together the unique manufacturer names (separated by underscore) for each unique value of fl. I tried

as.data.table(mpg)[,list(x = function(manufacturer) {paste(unique(manufacturer), collapse="_")} ),fl]

Error in `[.data.table`(as.data.table(mpg), , list(x = function(manufacturer) { : 
All items in j=list(...) should be atomic vectors or lists. If you are trying something like j=list(.SD,newcol=mean(colA)) then use := by group instead (much quicker), or cbind or merge afterwards.

An alternative solution is

sapply(unique(mpg$fl), FUN=function(x){paste(unique(mpg$manufacturer[mpg$fl==x]),collapse="_")})
Remi.b
  • 17,389
  • 28
  • 87
  • 168

1 Answers1

5

You could try this:

as.data.table(mpg)[,paste(unique(manufacturer),collapse="_"),by=fl]

Or, if your function is more elaborate you could write it separately:

myfun <- function(x){
  u_x <- unique(x)
  return(paste(u_x,collapse="_"))
}


res <- as.data.table(mpg)[,myfun(manufacturer),by=fl]
Heroka
  • 12,889
  • 1
  • 28
  • 38
  • Thank you for your answer. I realize that my example was poorly chosen because I was interested in applying a function that takes several steps (can't be all contained in one command) which isn't the case of my example. I might just accept you solution and open a new post with a better example. – Remi.b Oct 31 '15 at 21:48
  • 1
    I'm not a data-table expert. But it might be more readable to keep your function separate, and then just call it. See the edit in my answer. – Heroka Oct 31 '15 at 21:50
  • Oh...so we just can't define a function "inside the data.table command" but we must define it separately and then use it. Well...that was simple. Ok, thanks a lot +1 – Remi.b Oct 31 '15 at 22:01
  • 2
    I don't know if we 'must' do that (that's always tricky, 'always' and 'never'), but I think this works and makes your code readable. – Heroka Oct 31 '15 at 22:05
  • @Remi.b I believe you could do it just wrapping as `j = {...; f(.SD)}`. – jangorecki Oct 31 '15 at 22:30