1

I'm currently working on regression and classification with R.

Therefore I'm using a formula similar to X ~ Y in order to make predictions about X. I am now trying to use a function inside a for-loop to make multiple predictions about different values on the X side of the tilde and constant values on the Y side. Something like this: X1 ~ Y X2 ~ Y X3 ~ Y

with X1, X2, X3 and Y all being columns of data (data$X1 <- a, data$X2 <- b, data$X3 <- c, data$Y) if this is somehow important

so how can I dynamically select a variable inside the ~-Expression? I have tried something like this but it's not working:

# referring to "iris" data set with columns (Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, Species)

getFormula <- function(variable){
  variable ~ Sepal.Length + Sepal.Width + Species
}

petal.length.formula <- getFormula(Petal.Length)
petal.width.formula <- getFormula(Petal.Width)

i get this:

petal.length.formula: variable ~ Sepal.Lenght + Sepal.Width + Species
petal.width.formula: variable ~ Sepal.Lenght + Sepal.Width + Species

but i want to achieve this:

petal.length.formula: Petal.Length ~ Sepal.Lenght + Sepal.Width + Species
petal.width.formula: Petal.Width ~ Sepal.Lenght + Sepal.Width + Species

Since I have over 40 variables on the Y-Side and 10 variables on the X-Side, it would be really messy to type every single formula by hand. Can anybody help me with this issue?

I could not find a similar question since I have a hard time to figure out the keywords I have to use to find something about this.

If possible, I would prefer not to use any additional library since I'm rather new to R and want to first figure out the basics mechanics of R.

Since english is not my first language, I hope you can understand my question and I am of course happy to explain further if needed. Thank you in advance for your time!

coemu
  • 195
  • 1
  • 11
  • 2
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Functions like `reformulate()` and `update()` will help. – MrFlick May 28 '20 at 20:09
  • Thank you for you help to formulate my question. I have added a reproducible example. Is this ok now? – coemu May 28 '20 at 21:34

1 Answers1

1

You can try this, you need to pass a character to variable. It's much easier that way and if you have 10 variables on X side, you can easily iterate through them:

getFormula <- function(variable){
  as.formula(paste(variable,"~ Sepal.Length + Sepal.Width + Species"))
}

petal.length.formula <- getFormula("Petal.Length")
petal.width.formula <- getFormula("Petal.Width")

lm(petal.length.formula,data=iris)
Call:
lm(formula = petal.length.formula, data = iris)

Coefficients:
      (Intercept)       Sepal.Length        Sepal.Width  Speciesversicolor  
         -1.63430            0.64631           -0.04058            2.17023  
 Speciesvirginica  
          3.04911 

You can also try reformulate, as suggested by @BenBolker and @MrFlick:

getFormula <- function(variable){
  reformulate(c("Sepal.Length","Sepal.Width","Species"), 
response = variable, intercept = TRUE)
}

lm(getFormula("Petal.Length"),data=iris)

Call:
lm(formula = getFormula("Petal.Length"), data = iris)

Coefficients:
      (Intercept)       Sepal.Length        Sepal.Width  Speciesversicolor  
         -1.63430            0.64631           -0.04058            2.17023  
 Speciesvirginica  
          3.04911 
StupidWolf
  • 45,075
  • 17
  • 40
  • 72