0

What is the correct way to do something like this? I am trying to get the colSums of each group for specific columns. The . syntax seems incorrect with this type of subsetting.

csv<-data.frame(id_num=c(1,1,1,2,2),c(1,2,3,4,5),c(1,2,3,3,3))
temp<-csv%>%group_by(id_num)%>%colSums(.[,2:3],na.rm=T)
Rilcon42
  • 9,584
  • 18
  • 83
  • 167
  • There is already `summarise_each` or `summarise_at` in `dplyr` i.e. `csv%>%group_by(id_num)%>% summarise_each(funs(sum))` or `csv%>%group_by(id_num)%>% summarise_at(vars(2:3), sum)` – akrun Nov 05 '16 at 15:24
  • 1
    Fantastic, I never knew about the `vars()` syntax before! Would you mind putting that as a complete answer? – Rilcon42 Nov 05 '16 at 15:27

1 Answers1

2

This can be done with summarise_each or in the recent version additional functions like summarise_at, summarise_if were introduced for convenient use.

csv %>%
    group_by(id_num) %>%
    summarise_each(funs(sum))

csv %>%
     group_by(id_num) %>%
     summarise_at(2:3, sum) 

If we are using column names, wrap it with vars in the summarise_at

csv %>%
    group_by(id_num) %>%
    summarise_at(names(csv)[-1], sum)

NOTE: In the OP's dataset, the column names for the 2nd and 3rd columns were not specified resulting in something like c.1..2..3..4..5.

Using the vars to apply the function on the selected column names

csv %>%
   group_by(id_num) %>% 
   summarise_at(vars(c.1..2..3..4..5.), sum)
#    # A tibble: 2 × 2
#  id_num c.1..2..3..4..5.
#    <dbl>            <dbl>
#1      1                6
#2      2                9
akrun
  • 874,273
  • 37
  • 540
  • 662