3

I have a dataframe of different kind of variables (numeric, character, factor) on the columns which I would liko to summarise at once. I have an ID column to be counted according to the levels of the other columns.

Every column has different levels if they are character or factor and I would like to know the frequency of the IDs for each level. In addition if the column is numeric I would like to have returned summary statistics such as mean, sd, and quantiles.

Ideally I would do this with dplyr with group_by() and summarise() functions but it requires me to group each column at a time and then specify whether I want it counted with n() or whether I want summary statistics because of being numeric. In SAS there is a command known as PROC FREQ which I am trying to replicate.

df<-
  data.frame(
  ID = c(1,2,3,4,5,6),
  Age = c(20, 30, 45, 60, 70, 18),
  Car = c("Zum", "Yat", "Zum", "Zum", "Yat", "Rel"),
  Side = c("Left", "Right", "Left", "Left", "Right", "Right")
)

Result:

 df %>% group_by(Car) %>% summarise(n = n())
 df %>% group_by(Side) %>% summarise(n = n())
 df %>% summarise(mean = mean(Age))
 

I would like to obtain this result in a single output and for many variables. My real df contains tens of columns which should be either grouping variables or not depending on their nature. In addition the ID could be even repeated with the same values for the observations to be summarised.

Mr Frog
  • 296
  • 2
  • 16

1 Answers1

3

You could write a function to take action based on it's class. Here, we calculate mean if class of the column is numeric or else perform count of unique values in the column.

library(dplyr)

purrr::map(names(df)[-1], function(x) {
  if(is.numeric(df[[x]])) df %>% summarise(mean = mean(.data[[x]]))
  else df %>% count(.data[[x]])
})

#[[1]]
#  mean
#1 40.5

#[[2]]
#  Car n
#1 Rel 1
#2 Yat 2
#3 Zum 3

#[[3]]
#   Side n
#1  Left 3
#2 Right 3
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213