I am making a dynamic permutation function to create order independent parameters. Outside of a function, I have been able to hard code this approach with dplyr. However, I want to generalize it so that I could use the same function to permute 3 factors or 6 factors without typing all of the repeating calls. However, I have not figured out how to make it work.
Here's a simple data frame df
of all the permutations of 3 variables:
#> dput(df)
structure(list(var1 = structure(c(1L, 1L, 2L, 2L, 3L, 3L), .Label = c("a",
"b", "c"), class = "factor"), var2 = structure(c(2L, 3L, 1L,
3L, 1L, 2L), .Label = c("a", "b", "c"), class = "factor"), var3 = structure(c(3L,
2L, 3L, 1L, 2L, 1L), .Label = c("a", "b", "c"), class = "factor"),
X1 = c(0.5, 0.5, 0.8, 0.8, 0.3, 0.3), X2 = c(0.8, 0.3, 0.5,
0.3, 0.5, 0.8), X3 = c(0.3, 0.8, 0.3, 0.5, 0.8, 0.5)), .Names = c("var1",
"var2", "var3", "X1", "X2", "X3"), row.names = c(NA, -6L), class = "data.frame")
My goal is to get to the average order independent value of each variable. To get there, I need to create two intermediate variables: one a multiplication m1, m2, m3, m4
and one a subtraction s1, s2, s3, s4
. The variables m1
and s1
are special, m1 = X1
, and s1 = X1-1
. However, the others need to refer to the one before: m2 = X2*X1
and s2 = m2-m1
.
I tried to combine the ideas from this SO question: R - dplyr - mutate - use dynamic variable names with a lazyeval interp, so that I could dynamically refer to the other variables and also dynamically name mutated columns. However, it only kept the last one sent, and the rename did not work, so I got a single additional column, named, for example, X2*X3
, which is fine on this example with 3. When I had 5, it gave a single additional column X4*X5
.
for(n in 2:n_params) {
varname <- paste("m", n, sep=".")
df <- mutate_(df, .dots = setNames(interp(~one*two, one=as.name(paste0("X",n-1)),
two=as.name(paste0("X",n))),varname))
df
}
Since I can not figure out why this does not work, I have set up a series of if statements that calculate the m
s and s
s .
xx <- data.frame(df) %>%
mutate(m1 = X1,
s1 = X1 - 1)
if(n_params >= 2) {
xx <- data.frame(xx) %>%
mutate(m2 = m1 * X2,
s2 = m2 - m1)
}
if(n_params >= 3) {
xx <- data.frame(xx) %>%
mutate(m3 = m2 * X3,
s3 = m3 - m2)
}
if(n_params >= 4) {
xx <- data.frame(xx) %>%
mutate(m4 = m3 * X4,
s4 = m4 - m3)
}
if(n_params >= 5) {
xx <- data.frame(xx) %>%
mutate(m5 = m4 * X5,
s5 = m5 - m4)
}
if(n_params >= 6) {
xx <- data.frame(xx) %>%
mutate(m6 = m5 * X6,
s6 = m6 - m5)
}
It seems like I should be able to write a function that creates this,
In pseudocode:
function(n_params) {
function(x) {
new_df <- df %>%
mutate(m1 = X1,
s1 = X1 - 1)
for(i in 2:n_params){
new_df <- append(call to new_df,
mutate(mi = Xi*Xi-1,
si = mi-mi-1)
}
}
}
However, I cannot figure out how to combine the lazyeval interp
and the setNames to allow for referring to the previous mutated value.
I could just leave it in if functions, but I'd love to make this more compact if possible.
The final final output of interest is the average s value over all permutations for each initial variable. I do that in a separate function.