22

I am a new user of the data.table package in R. I am trying to give a name to the new column created by a "group by" command

> DT = data.table(x=rep(c("a","b"),c(2,3)),y=1:5) 
> DT
x y
1: a 1
2: a 2
3: b 3
4: b 4
5: b 5
> DT[,{z=sum(y);z+3},by=x]
x V1
1: a 6
2: b 15
  1. I would like to name the V1 (default) column directly (not having to use colnames), is it possible?
  2. Additionally, is it possible to perform several group by operations in one command, that would result in something like:

       x V1 V2
    1: a 6  something
    2: b 15 something
    

Thanks

statquant
  • 13,672
  • 21
  • 91
  • 162

2 Answers2

29
DT[,list(z=sum(y)+3,a=mean(y*z)),by=x]
   x  z  a
1: a  6  9
2: b 15 60

Since you are new to data.table, I recommend that you also study the help page of the setnames function as well as ?data.table and the data.table vignettes.

Roland
  • 127,288
  • 10
  • 191
  • 288
  • 8
    +1 I wasn't sure if statquant wanted to rename the `by` column. I couldn't find either in ?data.table, the FAQs or S.O. easily (amazingly, so will add to ?data.table). If they do want that, it's: `DT[,,by=list(newname=x)]`. – Matt Dowle Nov 23 '12 at 09:32
  • 4
    @MatthewDowle Interesting. However, simply using `setnames` keeps the code simple. I usually prefer that over one-liners, which tend to get a bit complicated. – Roland Nov 23 '12 at 09:43
  • Interesting. Ok, I see what you mean. – Matt Dowle Nov 23 '12 at 09:51
  • 5
    Thaks guys, that's good for both questions. BTW Matthew, this package is a massive helper, I am handeling data.frame of 5e6 millions rows, and the ancient box I had was coring...not anymore with data.table. – statquant Nov 23 '12 at 11:19
2

For conciseness, you can now use .() instead of list()

DT[, .(z=sum(y)+3, a=mean(y*z)), by=x]