1

I am making a dynamic permutation function to create order independent parameters. Outside of a function, I have been able to hard code this approach with dplyr. However, I want to generalize it so that I could use the same function to permute 3 factors or 6 factors without typing all of the repeating calls. However, I have not figured out how to make it work.

Here's a simple data frame df of all the permutations of 3 variables:

#> dput(df)
structure(list(var1 = structure(c(1L, 1L, 2L, 2L, 3L, 3L), .Label = c("a", 
"b", "c"), class = "factor"), var2 = structure(c(2L, 3L, 1L, 
3L, 1L, 2L), .Label = c("a", "b", "c"), class = "factor"), var3 =     structure(c(3L, 
2L, 3L, 1L, 2L, 1L), .Label = c("a", "b", "c"), class = "factor"), 
    X1 = c(0.5, 0.5, 0.8, 0.8, 0.3, 0.3), X2 = c(0.8, 0.3, 0.5, 
    0.3, 0.5, 0.8), X3 = c(0.3, 0.8, 0.3, 0.5, 0.8, 0.5)), .Names = c("var1", 
"var2", "var3", "X1", "X2", "X3"), row.names = c(NA, -6L), class = "data.frame")

My goal is to get to the average order independent value of each variable. To get there, I need to create two intermediate variables: one a multiplication m1, m2, m3, m4 and one a subtraction s1, s2, s3, s4. The variables m1 and s1 are special, m1 = X1, and s1 = X1-1. However, the others need to refer to the one before: m2 = X2*X1 and s2 = m2-m1.

I tried to combine the ideas from this SO question: R - dplyr - mutate - use dynamic variable names with a lazyeval interp, so that I could dynamically refer to the other variables and also dynamically name mutated columns. However, it only kept the last one sent, and the rename did not work, so I got a single additional column, named, for example, X2*X3, which is fine on this example with 3. When I had 5, it gave a single additional column X4*X5.

for(n in 2:n_params) {
     varname <- paste("m", n, sep=".")
     df <- mutate_(df, .dots = setNames(interp(~one*two, one=as.name(paste0("X",n-1)),
                                               two=as.name(paste0("X",n))),varname))
     df
   }

Since I can not figure out why this does not work, I have set up a series of if statements that calculate the ms and ss .

 xx <- data.frame(df) %>%
     mutate(m1 = X1,
            s1 = X1 - 1)
   if(n_params >= 2) {
     xx <- data.frame(xx) %>%
       mutate(m2 = m1 * X2,
              s2 = m2 - m1)
   }
   if(n_params >= 3) {
     xx <- data.frame(xx) %>%
       mutate(m3 = m2 * X3,
              s3 = m3 - m2)
   }
   if(n_params >= 4) {
     xx <- data.frame(xx) %>%
       mutate(m4 = m3 * X4,
              s4 = m4 - m3)
   }
   if(n_params >= 5) {
     xx <- data.frame(xx) %>%
       mutate(m5 = m4 * X5,
         s5 = m5 - m4)
   }
   if(n_params >= 6) {
     xx <- data.frame(xx) %>%
       mutate(m6 = m5 * X6,
              s6 = m6 - m5)
   }

It seems like I should be able to write a function that creates this,

In pseudocode:

function(n_params) {
 function(x) {
   new_df <- df %>% 
            mutate(m1 = X1,
                  s1 = X1 - 1)
   for(i in 2:n_params){
    new_df <- append(call to new_df, 
             mutate(mi = Xi*Xi-1,
                   si = mi-mi-1)
     }
   }
}

However, I cannot figure out how to combine the lazyeval interp and the setNames to allow for referring to the previous mutated value.

I could just leave it in if functions, but I'd love to make this more compact if possible.

The final final output of interest is the average s value over all permutations for each initial variable. I do that in a separate function.

Community
  • 1
  • 1
jessi
  • 1,438
  • 1
  • 23
  • 36

1 Answers1

0

Not the prettiest thing, but it works:

n_params = 3

xx1 = df %>%
mutate(m1 = X1,
       s1 = X1 - 1)

for (i in 2:n_params) {
xx1 = xx1 %>%
    mutate_(.dots = setNames(list(varval = paste0("m", i - 1, " * X", i)),
                             paste0("m", i))) %>%
    mutate_(.dots = setNames(list(varval = paste0("m", i, " - m", i - 1)),
                             paste0("s", i)))
}

There's probably much better ways to use lazyeval. Hopefully someone else will show a nice answer, but this does match the xx produced in your question (for n_params = 3):

identical(xx, xx1)
# [1] TRUE
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • I think I may have a version issue. If I try this, xx1 does create m2 and s2, but when it tries to create m3, it cannot find m2. error `Error: object 'm2' not found ` I'm running: R version 3.2.2 (2015-08-14) with lazyeval_0.1.10 dplyr_0.4.3 ; the error refers to Rcpp, and that is Rcpp_0.12.1. `stop(structure(list(message = "object 'm2' not found", call = NULL, cppstack = NULL), .Names = c("message", "call", "cppstack" ), class = c("Rcpp::eval_error", "C++Error", "error", "condition" ))) ` – jessi Dec 22 '15 at 22:55
  • My Rcpp version is 0.12.2, but otherwise we match. – Gregor Thomas Dec 22 '15 at 22:58
  • My error, I had `xx1 = xx %>%` - that would never work. Thanks for solving this. *my big question* I wonder why `list` works and saves the values and names but `interp` did not. – jessi Dec 22 '15 at 23:13
  • I just went off the question you linked... the `setNames(list())` idiom is used there, even when `interp` is also used. I would only expect `interp` to help simplify the variable name construction and clean up some of the `paste`ing. – Gregor Thomas Dec 22 '15 at 23:21
  • maybe all along, my mistake was dropping `list()` Thanks. – jessi Dec 22 '15 at 23:30