1

I am using dplyr summarise function. My data contain NAs so I need to include na.rm=TRUE for each call. for example:

group <- rep(c('a', 'b'), 3)
value <- c(1:4, NA, NA)
df = data.frame(group, value)

library(dplyr)
group_by(df, group) %>% summarise(

          mean = mean(value, na.rm=TRUE),

          sd = sd(value, na.rm=TRUE),

          min = min(value, na.rm=TRUE))

Is there a way to write the argument na.rm=TRUE only one time, and not on each row?

Rtist
  • 3,825
  • 2
  • 31
  • 40

2 Answers2

4

You should use summarise_at, which lets you compute multiple functions for the supplied columns and set arguments that are shared among them:

df %>% group_by(group) %>% 
  summarise_at("value", 
               funs(mean = mean, sd = sd, min = min), 
               na.rm = TRUE)
mtoto
  • 23,919
  • 4
  • 58
  • 71
1

If you're planning to apply your functions to one column only, you can use filter(!is.na()) in order to filter out any NA values of this variable only (i.e. NA in other variables won't affect the process).

group <- rep(c('a', 'b'), 3)
value <- c(1:4, NA, NA)
df = data.frame(group, value)

library(dplyr)

group_by(df, group) %>% 
  filter(!is.na(value)) %>%
  summarise(mean = mean(value),
            sd = sd(value),
            min = min(value))

# # A tibble: 2 x 4
#    group  mean       sd   min
#   <fctr> <dbl>    <dbl> <dbl>
# 1      a     2 1.414214     1
# 2      b     3 1.414214     2
AntoniosK
  • 15,991
  • 2
  • 19
  • 32