I am working on a function to perform PCA on a dataset, and I wanted to write a function to do the same stuff on different columns. However, I'm having a hard time doing so because I can't seem to make the function understand that I'm passing through columns. As an example:
perform_pca <- function(columns_to_exclude = c()) {
pca <- data %>%
select(-column_to_exclude) %>%
other_stuff() %>%
prcomp()
pvar_pve <- tibble(
p.var = pca$sdev ^ 2 / sum(pca$sdev ^ 2),
pve = cumsum(p.var),
row_id = seq(1, length(pca) - length(columns_to_exclude))
)
ggplot(pvar_pve, ...other things)
}
However, doing afterwards
perform_pca(c(data$column1, data$column2, whatever_else))
only works if I call it without arguments. If I pass it one or more columns, it gives me an error message about the tibble length.
Put another way, what is the correct way of passing tibble columns into functions so that dplyr recognizes them as such? For example
test <- function(columns) {
data %>%
select(columns)
}
test(c(var1,var2))
would return an error. What's the correct way to actually do this?