1

I'm trying to write a function to use the R tidymodels function initial_split with an argument that would let me change the strata to a different variable each time I call the function.

Using initial_split regularly like this works perfectly:

split_glab=initial_split(data,prop=0.7,strata=sp_glabrata)

Then I converted it to a function and plugged in my species parameter:

split_data=function(df,species){
  initial_split(df,prop=0.7,strata=species)
}

split_data(data,species=sp_glabrata)

And get the following error:

Error: Can't subset columns that don't exist.
x Column `species` doesn't exist.

Of course, this column doesn't exist in my data since it's just an argument in my function --the column I'm trying to reference is called sp_glabrata. I can't figure out how to get my function to reference the column instead of the parameter. I don't want to just type the column name since I have to apply many similar functions to several columns and it would take forever.

Any guidance would be appreciated!

lmml
  • 79
  • 8
  • It's hard to know without seeing any of your data. [See here](https://stackoverflow.com/q/5963269/5325862) on making a reproducible example that is easier for folks to help with and is useful to future users – camille Mar 24 '21 at 16:01

1 Answers1

2

As it is a tidy package, can make use of curly-curly operator ({{}}) to evaluate the unquoted argument as a column name

library(tidymodels)
split_data <- function(df, species){
  initial_split(df, prop=0.7, strata={{species}})
  }

-testing

split_data(iris, species = Species)
#<Analysis/Assess/Total>
#<105/45/150>
akrun
  • 874,273
  • 37
  • 540
  • 662