3

As part of my exploratory work I have built a function that provides a variety of metrics for each field in my dataset. I want to apply it to each column of my dataset.

library(tidyverse)
mtcars  %>%summarise_all(., .funs = funs(mean, median, sd, max, min, n_distinct))

However, this results a dataset with 1 row and each function/column combination as a column. The names are also concatenated like 'column_function'.

DESIRED result would be a 'tidy' format like:

ORIGINAL_COLUMN_NAME | FUNCTION | RESULT

I'm guessing there has to be an easy way to do this?

runningbirds
  • 6,235
  • 13
  • 55
  • 94
  • https://stackoverflow.com/questions/27879638/use-dplyrs-summarise-each-to-return-one-row-per-function?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa – GordonShumway Apr 09 '18 at 05:15
  • Just note that in the existing answer, variable name is separated on underscore which will cause issues with the variable `n_distinct`. One solution: rename columns containing `n_distinct` after the summarise. – neilfws Apr 09 '18 at 05:22

1 Answers1

2

Here is one option.

library(tidyverse)

mtcars %>%
  gather(Original_Column, Value) %>%
  group_by(Original_Column) %>%
  summarise_all(., .funs = funs(mean, median, sd, max, min, n_distinct)) %>%
  gather(Function, Result, -Original_Column)

# # A tibble: 66 x 3
#    Original_Column Function  Result
#    <chr>           <chr>      <dbl>
#  1 am              mean       0.406
#  2 carb            mean       2.81 
#  3 cyl             mean       6.19 
#  4 disp            mean     231.   
#  5 drat            mean       3.60 
#  6 gear            mean       3.69 
#  7 hp              mean     147.   
#  8 mpg             mean      20.1  
#  9 qsec            mean      17.8  
# 10 vs              mean       0.438
# # ... with 56 more rows
www
  • 38,575
  • 12
  • 48
  • 84