3

I would like to compute a value based on the values in two columns in a data.frame, however I would like to be able to write a function that could pass the column names into the function so I can perform similar analyses on different data.frames.

The following works as desired:

my.data.frame 
    %>% group_by_(.dots = c("label1", "label2")) 
    %>% summarise(disc.score = my.func(col1, col2))

where my.func is a function that expects two atomic numeric vectors as parameters.

What I would like to be able to do is something like this:

my.data.frame 
    %>% group_by_(.dots = c("label1", "label2")) 
    %>% summarise(disc.score = my.func(as.name("col1"), as.name("col2")))

However, this returns Error: object of type 'symbol' is not subsettable, the particular issue in my.func that is being complained about is y_col[x_col <= div], where x_col is "col1" and y_col is "col2".

I have also tried to accomplish this using summarise_() with no success. How can two columns be specified with variable names in a function called within summarise()?

Edit:

Small Working Example:

my.func <- function(x_col, y_col, cutoff) {
    disc.score <- 0
    y_col[x_col <= cutoff]
    return(length(y_col[x_col <= cutoff]))
}

my.data.frame <- data.frame(label = c( rep("A", 5), rep("B", 5)), 
                            x = c(1:10), 
                            y = c(11:20))

# this function call works:
my.data.frame 
    %>% group_by_("label") 
    %>% summarize(disc.score = my.func(x, y, 6))

# this one does not:
my.data.frame 
    %>% group_by_("label") 
    %>% summarize(disc.score = my.func(as.name("x"), as.name("y"), 6))
weitzner
  • 440
  • 3
  • 12
  • Why don't you just use `my.func(col1, col2)`? Seems like that would work., but you may need to look at `substitute()`. It would help to have a reproducible run and a look at the function – Rich Scriven Aug 13 '15 at 04:03
  • I would like for `col1` and `col2` to be variables passed in from another function, so I'd have to pass them through as strings. – weitzner Aug 13 '15 at 04:05
  • 1
    As an aside, can't you just do `group_by_(.dots=c("a","b"))` instead of all the `as.symbol` mucking around? – thelatemail Aug 13 '15 at 04:23
  • 1
    A small reproducible example would help. – phiver Aug 13 '15 at 06:48
  • Your example doesn't return a single value per group and so is not working for me. Have you looked at [this question/answer](http://stackoverflow.com/questions/26724124/standard-evaluation-in-dplyr-summarise-on-variable-given-as-a-character-string) about standard evaluation and dplyr? – aosmith Aug 13 '15 at 14:50
  • Using `summarize_(disc.score = interp(~my.func(x_col_name, y_col_name, 6), .values = c(x_col_name = as.name("x"), y_col_name = as.name("y"))))` does the trick! – weitzner Aug 13 '15 at 15:54

0 Answers0