I want to summarize a data frame according to values in two columns, except each column actually has the same variable in it and I don't care what the order is. Here is a sample data frame:
> df<-data.frame(val1=c(1,1,1,2,2), val2=c(1,2,2,1,1), val3=c(10, 20, 30, 40, 50))
> df
val1 val2 val3
1 1 1 10
2 1 2 20
3 1 2 30
4 2 1 40
5 2 1 50
I am interested in summarizing over val3
except I only care about the unordered combination of val1
and val2
. So for instance I want to treat
val1 val2 val3
1 1 2 20
the same as
val1 val2 val3
1 2 1 40
I can use dplyr to group by columns and summarize like this:
> df %>% group_by(val1, val2) %>% summarize(mean=mean(val3))
Source: local data frame [3 x 3]
Groups: val1 [?]
val1 val2 mean
<dbl> <dbl> <dbl>
1 1 1 10
2 1 2 25
3 2 1 45
But I want this result to only contain two rows!
I've thought about creating a new variable that reflects the combination of val1
and val2
and then group by that, but can't figure out an easy way to this. val1
and val2
don't have to be numeric and likely won't always be consecutive integers.