0

I have a dataframe organized as follows:

Variable 1 | Variable 2 | Variable 3 | Outcome Variable
---------- | ---------- | ---------- | ----------------
Factor     | Factor     | Factor     | Outcome 

for a few thousand rows, 15 variable columns, and 1 output column. I would like to summarize the table (preferably using plyr) in the following long format:

Variable 1 | Variable 2 | Variable 3 | Outcome Variable
---------- | ---------- | ---------- | ----------------
Factor 1   | Factor 1   | Factor 1   | Average Outcome 
Factor 1   | Factor 1   | Factor 2   | Average Outcome 
Factor 1   | Factor 2   | Factor 1   | Average Outcome 
Factor 1   | Factor 2   | Factor 2   | Average Outcome

for different variable combinations. What is the easiest way to do this?

user282041
  • 37
  • 8
  • 1
    You can use aggregate in base R. `aggregate(outcome ~ fac1 + fac2 + fac3, data=dat, FUN=mean)`. – lmo May 19 '17 at 13:55

1 Answers1

0

We can use dplyr

library(dplyr)
df1 %>%
    group_by(variable1, variable2, variable3) %>%
    summarise(OutcomeVariable = mean(OutcomeVariable))

Or with base R

aggregate(OutcomeVariable ~., df1, FUN = mean)
akrun
  • 874,273
  • 37
  • 540
  • 662