2

I'm trying to create a function where you can pass a data frame and the name of one of the columns. Within the function it should mutate the data frame to create a scaled version of the column you sent. Here's my attempt:

test_scale <- function(outcome, data){
  outcome_scaled = paste0(outcome, "_s")
  data = data %>% mutate(!!outcome_scaled := scale(as.numeric(outcome)))
  print(head(data[, outcome_scaled]))
}

However, this doesn't work since it just prints the text of whatever outcome I pass it.

> test_scale("age", df)
mutate: new variable 'age_s' (character) with one unique value and 0% NA
[1] "age" "age" "age" "age" "age" "age"

How do I get the actual value of outcome instead of the string text of the outcome variable that's passed?

Parseltongue
  • 11,157
  • 30
  • 95
  • 160
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Maybe check out the examples at https://dplyr.tidyverse.org/articles/programming.html first. – MrFlick Dec 02 '21 at 04:27
  • May I ask version of your `dplyr`? – Park Dec 02 '21 at 04:34
  • 4
    Variations of this question are asked and answered a lot - in keeping with the latest dplyr semantics, you want something like `test_scale <- function(outcome, data) { data %>% mutate(across({{outcome}}, ~ scale(as.numeric(.x)), .names = "{col}_s")) }` – Ritchie Sacramento Dec 02 '21 at 04:37
  • 2
    who is going to write the package that takes any arbitrary dplyr code and translates it into whatever the latest dplyr semantics are? bonus points if you have to coerce your code to a data frame for it to work – rawr Dec 02 '21 at 05:02

1 Answers1

1

Edit

Ritchie Sacramento's answer in the comments is better; use that.

--

Here is one potential solution:

library(tidyverse)

test_scale <- function(outcome, data){
  outcome <- ensym(outcome)
  outcome_scaled = paste0(outcome, "_s")
  data2 = data %>% mutate(outcome_scaled := scale(as.numeric(!!outcome)))
  print(head(data2[, "outcome_scaled"]))
}
test_scale("Sepal.Length", iris)
#>            [,1]
#> [1,] -0.8976739
#> [2,] -1.1392005
#> [3,] -1.3807271
#> [4,] -1.5014904
#> [5,] -1.0184372
#> [6,] -0.5353840

Using ensym() means that you don't necessarily need to quote "outcome":

test_scale(Sepal.Length, iris)
#>            [,1]
#> [1,] -0.8976739
#> [2,] -1.1392005
#> [3,] -1.3807271
#> [4,] -1.5014904
#> [5,] -1.0184372
#> [6,] -0.5353840

Created on 2021-12-02 by the reprex package (v2.0.1)

jared_mamrot
  • 22,354
  • 4
  • 21
  • 46