1

How can I use the function summarise_all (from dplyr), but exclude several columns?

I saw some other topics on Stackoverflow, but they didn't gave an workable solution (How to apply summarise_each to all columns except one?).

The select() option is not really a solution, because I'm dealing with a lot of columns.

This is my code:

data_frame1 <- data_frame %>% 
  group_by(visitor_id) %>%
  summarise_all(funs(sum), - hit_time_gmt, - visit_start_pagename)

The error that I get:

Error in eval_bare(dot$expr, dot$env) : object 'hit_time_gmt' not found

And I don't get it, because when I check data_frame, I see that the columns hit_time_gmt and visit_start_pagename are existing.

R overflow
  • 1,292
  • 2
  • 17
  • 37
  • 2
    put those inside `vars` and use `summarise_at` – akrun Feb 07 '18 at 14:59
  • 1
    also see `summarize_if` – IceCreamToucan Feb 07 '18 at 15:01
  • 2
    or `summarize_at` – Georgery Feb 07 '18 at 15:02
  • Sorry, not sure if I'm doing it right. But is this what you had in mind, @akrun: data_frame1 <- data_frame %>% group_by(visitor_id) %>% summarise_at(- hit_time_gmt, - visit_start_pagename, funs(sum)) – R overflow Feb 07 '18 at 15:06
  • 3
    I meant `data_frame %>% group_by(visitor_id) %>% summarise_at(vars(-hit_time_gmt, -visit_start_pagename), funs(sum(., na.rm = TRUE)))` The `na.rm` is to take account for the NA values if any – akrun Feb 07 '18 at 15:08
  • Once again, you saved me. – R overflow Feb 07 '18 at 15:13
  • 1
    Alternatively you could use `data.table` which is faster anyway. Using `.SDcols` should solve your problem. For multiple column names (a hundered, say), you could always `grep` or use `setdiff` over `colnames`. – Gautam Feb 07 '18 at 15:16

0 Answers0