3

Say I have a data frame like this in R:

df <- data.frame(factor1 = c("A","B","B","C"),
                factor2 = c("M","F","F","F"),
                factor3 = c("0", "1","1","0"),
                value = c(23,32,4,1))

I want to get a summary statistic in dplyr grouped by one variable, like so (but more complicated):

df %>% 
    group_by(factor1) %>% 
    summarize(mean = mean(value)) 

Now I'd like to do this for all factor columns (think 100 factor variables). Is there a way to do this within dplyr? I was also thinking of doing a for loop over names(df) but I get the variables as strings and group_by() doesn't accept strings.

Alexandru Papiu
  • 424
  • 4
  • 12

1 Answers1

5

Just put your data in long form.

library(tidyr)
df %>% gather(key = factor, value = level, -value) %>%
    group_by(factor, level) %>%
    summarize(mean = mean(value))

#    factor level     mean
#     (chr) (chr)    (dbl)
# 1 factor1     A 23.00000
# 2 factor1     B 18.00000
# 3 factor1     C  1.00000
# 4 factor2     F 12.33333
# 5 factor2     M 23.00000
# 6 factor3     0 12.00000
# 7 factor3     1 18.00000

To actually build a loop instead, the Programming with dplyr vignette is the right place to start.

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • This is great thanks! Do you know if you could do this in a for loop, iterating over the column names? It might be useful to have a list of dataframes, one for each factor. – Alexandru Papiu Mar 29 '16 at 04:06
  • Why do you want to loop? If you want a list of data frames at the end take the result above and `split(result, result$factor)`. – Gregor Thomas Mar 29 '16 at 05:05
  • It might come in handy in other situations. I guess I am curious more generally to see how you would use column names inside dplyr without naming them. For example: `df%>%filter(names(df)[1] == "A")` doesn't work but perhaps something similar would? – Alexandru Papiu Mar 29 '16 at 07:52
  • 1
    @Gregor It is not clear to me why you are recommending the hybrid evaluation vignette. Perhaps you meant the [non-standard evaluation vignette](https://cran.r-project.org/web/packages/dplyr/vignettes/nse.html)? – tchakravarty Jul 22 '16 at 07:00
  • That is indeed what I mean. – Gregor Thomas Jul 22 '16 at 16:09