0

I am trying to do some aggregation using data.table to get the mean and first value of some columns. For example:

dt <- data.table(mtcars)

dt[, .(disp = mean(disp, na.rm=T),
       hp = mean(hp, na.rm=T)), 
   by=cyl]

out:
   cyl     disp        hp
1:   6 183.3143 122.28571
2:   4 105.1364  82.63636
3:   8 353.1000 209.21429

desired:
   cyl     disp        hp   wt
1:   6 183.3143 122.28571 2.62
2:   4 105.1364  82.63636 2.32
3:   8 353.1000 209.21429 3.44

To extract the first row by itself, this can be done:

dt[, .SD[1], by=cyl][,.(cyl,wt)]

out:
   cyl   wt
1:   6 2.62
2:   4 2.32
3:   8 3.44

But how do I do it with the other aggregate functions?

In dplyr, I will simply do:

mtcars %>% group_by(cyl) %>% 
  summarise(disp = mean(disp, na.rm=T), 
            hp = mean(disp, na.rm=T), 
            wt = first(wt))

out:
  cyl     disp       hp    wt
1   4 105.1364 105.1364  2.32
2   6 183.3143 183.3143  2.62
3   8 353.1000 353.1000  3.44
TYL
  • 1,577
  • 20
  • 33

1 Answers1

2

data.table also has first

library(data.table)

dt[, .(disp = mean(disp, na.rm=T),
       hp = mean(hp, na.rm=T),
       wt = data.table::first(wt)), 
   by=cyl]

   cyl     disp        hp   wt
1:   6 183.3143 122.28571 2.62
2:   4 105.1364  82.63636 2.32
3:   8 353.1000 209.21429 3.44
Waldi
  • 39,242
  • 6
  • 30
  • 78
  • 1
    As a rule of thumbs, `data.table` [can do most functions allowed by `dplyr`](https://stackoverflow.com/questions/21435339/data-table-vs-dplyr-can-one-do-something-well-the-other-cant-or-does-poorly) or `SQL`, and more. – Waldi Jun 09 '21 at 07:46