What is the right way to reference part of a dataframe after piping?

Question

What is the correct way to do something like this? I am trying to get the colSums of each group for specific columns. The . syntax seems incorrect with this type of subsetting.

csv<-data.frame(id_num=c(1,1,1,2,2),c(1,2,3,4,5),c(1,2,3,3,3))
temp<-csv%>%group_by(id_num)%>%colSums(.[,2:3],na.rm=T)

There is already `summarise_each` or `summarise_at` in `dplyr` i.e. `csv%>%group_by(id_num)%>% summarise_each(funs(sum))` or `csv%>%group_by(id_num)%>% summarise_at(vars(2:3), sum)` — akrun, Nov 05 '16 at 15:24
Fantastic, I never knew about the `vars()` syntax before! Would you mind putting that as a complete answer? — Rilcon42, Nov 05 '16 at 15:27

akrun · Accepted Answer · 2016-11-05T15:38:28.697

This can be done with summarise_each or in the recent version additional functions like summarise_at, summarise_if were introduced for convenient use.

csv %>%
    group_by(id_num) %>%
    summarise_each(funs(sum))

csv %>%
     group_by(id_num) %>%
     summarise_at(2:3, sum)

If we are using column names, wrap it with vars in the summarise_at

csv %>%
    group_by(id_num) %>%
    summarise_at(names(csv)[-1], sum)

NOTE: In the OP's dataset, the column names for the 2nd and 3rd columns were not specified resulting in something like c.1..2..3..4..5.

Using the vars to apply the function on the selected column names

csv %>%
   group_by(id_num) %>% 
   summarise_at(vars(c.1..2..3..4..5.), sum)
#    # A tibble: 2 × 2
#  id_num c.1..2..3..4..5.
#    <dbl>            <dbl>
#1      1                6
#2      2                9

What is the right way to reference part of a dataframe after piping?

1 Answers1