Due to the non-standard evaluation used in the dplyr verbs, filter
will always look for an actual variable called gene
in your data frame, rather than looking for the column name being passed to the function. To get around this, you need to capture the symbol being passed to the function (known as "quoting"), then tell filter
that you want to "unquote" that variable.
Since rlang 0.4.0
(which contains much of the machinery behind the tidyverse's use of non-standard evaluation), we have been able to achieve these two operations in a single step, using the curly-curly syntax filter({{gene}} == 1)
Note also that your function stores to the variable t
, but doesn't return anything, so a working version of your function would be:
library(dplyr)
test <- function(gene) {
df %>% filter({{gene}} == 1) %>% select(sample)
}
We can see that this does the trick:
test(genea)
#> sample
#> 1 1
#> 2 2
#> 3 3
#> 5 5
A further point to note would be that it is not great practice to use the name of variables inside your function that are presumed to exist in the calling scope. Rather than your function referring to df
, you should probably have your function explicitly take a data
argument, and rather than assuming your data has a column called sample
, you should pass the column you wish to select.
Your function might then look like this:
test <- function(data, gene, col) {
data %>% filter({{gene}} == 1) %>% select({{col}})
}
And be called like this
test(df, genea, sample)
This gives the same result, but is a more useful function which can be used whatever the names of your data frame and sample column are.
Created on 2022-11-10 with reprex v2.0.2
Data in reproducible format
df <- structure(list(sample = 1:6, genea = c(1L, 1L, 1L, 0L, 1L, 0L
), geneb = c(1L, 1L, 0L, 0L, 0L, 0L), genec = c(1L, 1L, 0L, 0L,
1L, 0L), gened = c(0L, 0L, 1L, 0L, 1L, 0L), genee = c(0L, 0L,
1L, 0L, 1L, 0L), genef = c(0L, 0L, 1L, 0L, 1L, 0L)),
class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6"))