10

I have a data frame md:

md <- data.frame(x = c(3,5,4,5,3,5), y = c(5,5,5,4,4,1), z = c(1,3,4,3,5,5),
      device1 = c("c","a","a","b","c","c"), device2 = c("B","A","A","A","B","B"))
md[2,3] <- NA
md[4,1] <- NA
md

I want to calculate means by device1 / device2 combinations using dplyr:

library(dplyr)
md %>% group_by(device1, device2) %>% summarise_each(funs(mean))

However, I am getting some NAs. I want the NAs to be ignored (na.rm = TRUE) - I tried, but the function doesn't want to accept this argument. Both these lines result in error:

md %>% group_by(device1, device2) %>% summarise_each(funs(mean), na.rm = TRUE)
md %>% group_by(device1, device2) %>% summarise_each(funs(mean, na.rm = TRUE))
Jaap
  • 81,064
  • 34
  • 182
  • 193
user2323534
  • 585
  • 1
  • 6
  • 18

3 Answers3

13

The other answers showed you the syntax for passing mean(., na.rm = TRUE) into summarize/_each.

Personally, I deal with this so often and it's so annoying that I just define the following convenience set of NA-aware basic functions (e.g. in my .Rprofile), such that you can apply them with dplyr with summarize(mean_) and no pesky arg-passing; also keeps the source-code cleaner and more readable, which is another strong plus:

mean_   <- function(...) mean(..., na.rm=T)
median_ <- function(...) median(..., na.rm=T)
sum_    <- function(...) sum(..., na.rm=T)
sd_     <- function(v)   sqrt(sum_((v-mean_(v))^2) / length(v))
cor_    <- function(...) cor(..., use='pairwise.complete.obs')
max_    <- function(...) max(..., na.rm=T)
min_    <- function(...) min(..., na.rm=T)
pmax_   <- function(...) pmax(..., na.rm=T)
pmin_   <- function(...) pmin(..., na.rm=T)
table_  <- function(...) table(..., useNA='ifany')
mode_   <- function(...) {
  tab <- table(...)
  names(tab[tab==max(tab)]) # the '==' implicitly excludes NA values
}
clamp_  <- function(..., minval=0, maxval=70) pmax(minval, pmin(maxval,...))

Really you want to be able to flick one global switch once and for all, like na.action/na.pass/na.omit/na.fail to tell functions as default behavior what to do, and not throw errors or be inconsistent, as they currently do, across different packages.

There used to be a CRAN package called Defaults for setting per-function defaults but it is not maintained since 2014, pre-3.x . For more about it Setting Function Defaults R on a Project Specific Basis

smci
  • 32,567
  • 20
  • 113
  • 146
  • 1
    I really object to the downvoters, this is a solution that took me several years of pain to come to; it's compact, readable, elegant, and you can still override defaults with the `...` passthrough args. If the only objection is the naming convention, then propose a better one, already. – smci Dec 13 '15 at 06:02
  • @Jaap: since as I mentioned this goes in my ~/.Rprofile and lots of other boilerplate, I strongly prefer to write compact rather than verbose code. Hence `na.rm=T` rather than `na.rm = TRUE`. It's actually more legible when you eliminate non-meaningful whitespace. – smci Sep 17 '19 at 05:29
  • ok, no problem; I thought it would be better to make it more readable – Jaap Sep 17 '19 at 05:33
10

try:

 library(dplyr)
 md %>% group_by(device1, device2) %>%
        summarise_each(funs(mean(., na.rm = TRUE)))
jeremycg
  • 24,657
  • 5
  • 63
  • 74
8

Simple as that:

funs(mean(., na.rm = TRUE))
zero323
  • 322,348
  • 103
  • 959
  • 935