1

I am trying to get mean, sum,count for all columns group by class variable but for count - n() (3rd statement) I am getting error as

Error: This function should not be called directly

Class <- c("A","A","A","A","B","B","B","C","C","C","C","C","C")
A<-c(23,33,NA,56,22,34,34,45,65,5,57,75,57)
D<-c(2,133,5,60,23,312,341,25,75,NA,3,9,21)
M<-c(34,35,67,325,46,56,547,47,67,67,68,3,12)

df <- data.frame(Class,A,D,M)
library(dplyr)

system.time(df_sum <- df %>% group_by(Class) %>% summarise_if(is.numeric, sum , na.rm=T))
system.time(df_mean <- df %>% group_by(Class) %>% summarise_if(is.numeric, mean , na.rm=T))

system.time(df_count <- df %>% group_by(Class) %>% summarise_if(is.numeric, n() , na.rm=T))

please suggest me any modification required for above statement.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
suny
  • 119
  • 9

1 Answers1

3

To get the number of non NA values in each numeric column you can use:

library(dplyr)

df %>%
  group_by(Class) %>%
  summarise_if(is.numeric,
               function(x) sum(!is.na(x)))

#output
# A tibble: 3 x 4
  Class     A     D     M
  <fct> <int> <int> <int>
1 A         3     4     4
2 B         3     3     3
3 C         6     5     6

the n() function is not nearly as flexible and it does not have an na.rm argument

missuse
  • 19,056
  • 3
  • 25
  • 47