I would like to compute a value based on the values in two columns in a data.frame
, however I would like to be able to write a function that could pass the column names into the function so I can perform similar analyses on different data.frame
s.
The following works as desired:
my.data.frame
%>% group_by_(.dots = c("label1", "label2"))
%>% summarise(disc.score = my.func(col1, col2))
where my.func
is a function that expects two atomic numeric vectors as parameters.
What I would like to be able to do is something like this:
my.data.frame
%>% group_by_(.dots = c("label1", "label2"))
%>% summarise(disc.score = my.func(as.name("col1"), as.name("col2")))
However, this returns Error: object of type 'symbol' is not subsettable
, the particular issue in my.func
that is being complained about is y_col[x_col <= div]
, where x_col
is "col1" and y_col
is "col2".
I have also tried to accomplish this using summarise_()
with no success. How can two columns be specified with variable names in a function called within summarise()
?
Edit:
Small Working Example:
my.func <- function(x_col, y_col, cutoff) {
disc.score <- 0
y_col[x_col <= cutoff]
return(length(y_col[x_col <= cutoff]))
}
my.data.frame <- data.frame(label = c( rep("A", 5), rep("B", 5)),
x = c(1:10),
y = c(11:20))
# this function call works:
my.data.frame
%>% group_by_("label")
%>% summarize(disc.score = my.func(x, y, 6))
# this one does not:
my.data.frame
%>% group_by_("label")
%>% summarize(disc.score = my.func(as.name("x"), as.name("y"), 6))