1

I'm trying to write a function that uses dplyr to summarize counts and percentages for multiple categorical variable, grouped over a single variable.

The function seems to be doing what I want:

#toy dataset
meals <- data.frame(
  day = rep(1:6, each = 2),
  parent = rep(c("mom", "dad"), 6),
  breakfast = sample(c("cereal", "banana", "yogurt"), size = 12, replace = TRUE), 
  lunch = sample(c("sandwich", "soup"), size = 12, replace = TRUE)
)

#function to print summary tables - works
summarytable <- function(data, dep, ind) {
  for (var in ind) {
    
    x <- data %>%
      group_by({{dep}}) %>% 
      count(.data[[var]] ) %>% 
      mutate(pct = n/sum(n), total = paste0(n, " (", pct, ")")) %>% 
      select(-n, -pct) %>% 
      pivot_wider(names_from = {{dep}}, values_from = total)
    
    print(x)
  }
}

meals %>%  summarytable(dep = parent, ind = c("breakfast", "lunch"))
# A tibble: 3 x 3
  breakfast dad                   mom                  
  <fct>     <chr>                 <chr>                
1 banana    3 (0.5)               2 (0.333333333333333)
2 cereal    2 (0.333333333333333) 1 (0.166666666666667)
3 yogurt    1 (0.166666666666667) 3 (0.5)              
# A tibble: 2 x 3
  lunch    dad                   mom                  
  <fct>    <chr>                 <chr>                
1 sandwich 4 (0.666666666666667) 2 (0.333333333333333)
2 soup     2 (0.333333333333333) 4 (0.666666666666667)

But when I try to save the tables in a list, it doesn't seem to be working.

#function to save output in a list - not working
summarytable2 <- function(data, dep, ind) {
  for (var in ind) {
    
    x <- data %>%
      group_by({{dep}}) %>% 
      count(.data[[var]] ) %>% 
      mutate(pct = n/sum(n), total = paste0(n, " (", pct, ")")) %>% 
      select(-n, -pct) %>% 
      pivot_wider(names_from = {{dep}}, values_from = total)
    
    outList[[var]] <- x
  }
}

outList <- list()
meals %>%  summarytable2(dep = parent, ind = c("breakfast", "lunch"))
outList
list()

I tried pre-specifying the number of elements in outList as suggested here and adding an iterator (here), but still no luck.

Any suggestions would be greatly appreciated!

nebulous
  • 43
  • 4

1 Answers1

2

We can create the outList within the function and return that as output

library(dplyr)
library(formattable)
summarytable2 <- function(data, dep, ind) {
  outList <- vector('list', length(ind))
  names(outList) <- ind
  for (var in ind) {
    
    x <- data %>%
      group_by({{dep}}) %>% 
      count(.data[[var]] ) %>% 
      mutate(pct = percent(n/sum(n), 1), total = paste0(n, " (", pct, ")")) %>% 
      select(-n, -pct) %>% 
      pivot_wider(names_from = {{dep}}, values_from = total)
    
    outList[[var]] <- x
  }
  return(outList)
}

-checking

summarytable2(meals, dep = parent, ind = c("breakfast", "lunch"))
#$breakfast
# A tibble: 3 x 3
#  breakfast dad       mom      
#  <chr>     <chr>     <chr>    
#1 banana    1 (16.7%) 3 (50.0%)
#2 cereal    2 (33.3%) 2 (33.3%)
#3 yogurt    3 (50.0%) 1 (16.7%)

#$lunch
# A tibble: 2 x 3
#  lunch    dad       mom      
#  <chr>    <chr>     <chr>    
#1 sandwich 1 (16.7%) 3 (50.0%)
#2 soup     5 (83.3%) 3 (50.0%)
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thanks @akrun! Works great. I tried creating outList and returning it within the for loop, but seems you have to do it within the function, but outside the loop. Why is that? Also, I edited my question so it wasn't dependent on formattable as you were submitting your answer. – nebulous Sep 18 '20 at 19:43
  • @nebulous `percent` is from many different packages. I select `formattable`, but it shoudl work with your package as well – akrun Sep 18 '20 at 19:44
  • @nebulous You could create the outList also outside the function, but make sure to pass it as an argument to function as function env can have scoping issues in accessing the objects from the global env.. I would prefer to create it inside so that it won't have any issue when the global env is changed – akrun Sep 18 '20 at 19:46