1

I want to make one function where I can easily run multiple models. only the models input variables that are used differ. I use the rpart function for this model. ideally I have a table (named variables) with models and its variables. something that looks like this

model1           model2     model3         …………………
gender          gender      age
age             education   wageparents
education                   nfriends
                            married

and than have a function where I can just insert fun(data, variables)

what I used so far is:

tree <-rpart(wage ~  gender + age + education, method='class', data=Data, control=rpart.control(minsplit=1, minbucket=1, cp=0.002))

this works, but I have to change the model formula everytime

I tried something like this, but I am not sure what datatype I have to use etc.

wagefun <- function(Data, variables$model1){
  tree <-rpart(wage ~  variables$model1,  method='class', data=Data, control=rpart.control(minsplit=1, minbucket=1, cp=0.002))
  return(tree)
}
asachet
  • 6,620
  • 2
  • 30
  • 74
Stan
  • 45
  • 5
  • Try `rpart(wage ~ ., method='class', data=Data[, c("wage", variables$model1)], control=rpart.control(minsplit=1, minbucket=1, cp=0.002))` – Roland Aug 07 '19 at 08:23
  • You can use purrr to iterate over formulae stored in a vector as described in this question: https://stackoverflow.com/questions/48450308/iterating-over-formulas-in-purrr – r.bot Aug 07 '19 at 08:27
  • You need to construct the formula. This is typically done by pasting together a string and calling `as.formula`, but there are better ways. See https://stackoverflow.com/questions/12967797/is-there-a-better-alternative-than-string-manipulation-to-programmatically-build – asachet Aug 07 '19 at 08:34
  • Possible duplicate of [Is there a better alternative than string manipulation to programmatically build formulas?](https://stackoverflow.com/questions/12967797/is-there-a-better-alternative-than-string-manipulation-to-programmatically-build) – asachet Aug 07 '19 at 08:35

1 Answers1

1

Create the formula with reformulate:

form <- reformulate(termlabels = variables$model1, response = "wage", intercept = TRUE)
rpart(form, ...)

Note the intercept term that you have ignored so far: it is an additional modelling choice.

asachet
  • 6,620
  • 2
  • 30
  • 74