0

I am working on a function to perform PCA on a dataset, and I wanted to write a function to do the same stuff on different columns. However, I'm having a hard time doing so because I can't seem to make the function understand that I'm passing through columns. As an example:

perform_pca <- function(columns_to_exclude = c()) {
  pca <- data %>%
    select(-column_to_exclude) %>%
    other_stuff() %>%
    prcomp()
  pvar_pve <- tibble(
    p.var = pca$sdev ^ 2 / sum(pca$sdev ^ 2),
    pve = cumsum(p.var),
    row_id = seq(1, length(pca) - length(columns_to_exclude))
  )
  ggplot(pvar_pve, ...other things)
}

However, doing afterwards

perform_pca(c(data$column1, data$column2, whatever_else))

only works if I call it without arguments. If I pass it one or more columns, it gives me an error message about the tibble length.

Put another way, what is the correct way of passing tibble columns into functions so that dplyr recognizes them as such? For example

test <- function(columns) {
  data %>%
    select(columns)
}

test(c(var1,var2))

would return an error. What's the correct way to actually do this?

1 Answers1

0

You can do it without curly brackets just by using ... to pass to select and passing column names separately:

library(tidyverse)

data <- tibble(
  a = 1:10,
  b = rnorm(10),
  c = letters[1:10],
  d = 21:30
)


test <- function(data, ...) {
  data %>%
    select(-c(...))
}

test(data, a, b)
#> # A tibble: 10 × 2
#>    c         d
#>    <chr> <int>
#>  1 a        21
#>  2 b        22
#>  3 c        23
#>  4 d        24
#>  5 e        25
#>  6 f        26
#>  7 g        27
#>  8 h        28
#>  9 i        29
#> 10 j        30

See here for info on this and other ways of doing things with tidy evaluation. The benefits of doing it this way and also using data as your first argument is that you can pipe your dataframe into the function and it will use 'tidyselect' to suggest variables to include as arguments to the function from inside your dataframe environment.

You can do it with passing a vector of columns, which is where curly brackets are needed:

test <- function(data, vars) {
  data %>%
    select(-c({{vars}}))
}


test(data, c(a, b))
Andy Baxter
  • 5,833
  • 1
  • 8
  • 22