1

I'm in the process of creating summaries tables based on subgroups and would love to add an overall summary in a tidyer/more efficient manner.

What I have so far is this. I've created summaries via levels within my factor variables.

library(tidyverse)

df <- data.frame(var1 = 10:18, 
                 var2 = c("A","B","A","B","A","B","A","B","A"))

group_summary <- df %>% group_by(var2) %>% 
                 filter(var2 != "NA") %>% 
                 summarise("Max" = max(var1, na.rm = TRUE),
                           "Median" = median(var1, na.rm = TRUE),
                           "Min" = min(var1, na.rm = TRUE),
                           "IQR" = IQR(var1, na.rm = TRUE),
                           "Count" = n())

Next I created an overall summary.

Summary <- df %>% 
           filter(var2 != "NA") %>% 
           summarise("Max" = max(var1, na.rm = TRUE),
           "Median" = median(var1, na.rm = TRUE),
           "Min" = min(var1, na.rm = TRUE),
           "IQR" = IQR(var1, na.rm = TRUE),
           "Count" = n())

Finally, I bound the two objects with dplyr::bind_rows

complete_summary <- bind_rows(Summary, group_summary)

What I've done works but it is very, very verbose and can't be the most efficient way. I tried to use ungroup

  group_summary <- df %>% group_by(var2) %>% 
                 filter(var2 != "NA") %>% 
                 summarise("Max" = max(var1, na.rm = TRUE),
                           "Median" = median(var1, na.rm = TRUE),
                           "Min" = min(var1, na.rm = TRUE),
                           "IQR" = IQR(var1, na.rm = TRUE),
                           "Count" = n()) %>% ungroup %>% 
                 summarise("Max" = max(var1, na.rm = TRUE),
                           "Median" = median(var1, na.rm = TRUE),
                           "Min" = min(var1, na.rm = TRUE),
                           "IQR" = IQR(var1, na.rm = TRUE),
                           "Count" = n())

but it threw an error:

  Evaluation error: object var1 not found.

Thanks in advance for your assistance.

Evan
  • 31
  • 8

2 Answers2

0

Not the most elegant solution either, but simple:

c <- mtcars %>%
  mutate(total_mean = mean(wt),
         total_median = median(wt)) %>%
  group_by(cyl) %>%
  summarise(meanweight = mean(wt),
            medianweight = median(wt),
            total_mean = first(total_mean),
            total_median = first(total_median)) 
yoland
  • 504
  • 4
  • 13
0

Ideally, if you want to do it in one-chain, this is how you can do by using bind_rows to combine both the results, just like you've done - but removing the temporary objects you created.

library(tidyverse)
#> Warning: package 'tibble' was built under R version 3.5.2

df <- data.frame(var1 = 10:18, 
                 var2 = c("A","B","A","B","A","B","A","B","A"))



df %>% group_by(var2) %>% 
  filter(var2 != "NA") %>% 
  summarise("Max" = max(var1, na.rm = TRUE),
            "Median" = median(var1, na.rm = TRUE),
            "Min" = min(var1, na.rm = TRUE),
            "IQR" = IQR(var1, na.rm = TRUE),
            "Count" = n()) %>% #ungroup() %>% 

  bind_rows( df %>% summarise("Max" = max(var1, na.rm = TRUE),
                    "Median" = median(var1, na.rm = TRUE),
                    "Min" = min(var1, na.rm = TRUE),
                    "IQR" = IQR(var1, na.rm = TRUE),
                    "Count" = n()))
#> # A tibble: 3 x 6
#>   var2    Max Median   Min   IQR Count
#>   <fct> <dbl>  <dbl> <dbl> <dbl> <int>
#> 1 A        18     14    10     4     5
#> 2 B        17     14    11     3     4
#> 3 <NA>     18     14    10     4     9

Created on 2019-01-29 by the reprex package (v0.2.1)

amrrs
  • 6,215
  • 2
  • 18
  • 27
  • That worked like a charm! Now my project is going to be to turn it into a function so that I don't have to keep copying and pasting it so many times. Cheers! – Evan Jan 30 '19 at 05:28
  • 1
    This thread will help you to do it much simpler https://stackoverflow.com/questions/9847054/how-to-get-summary-statistics-by-group – amrrs Jan 30 '19 at 07:13