59

I was trying to automate a piece of my code so that programming become less tedious.

Basically I was trying to do a stepwise selection of variables using fastbw() in the rms package. I would like to pass the list of variables selected by fastbw() into a formula as y ~ x1+x2+x3, "x1" "x2" "x3" being the list of variables selected by fastbw()

Here is the code I tried and did not work

olsOAW0.r060 <- ols(roll_pct~byoy+trans_YoY+change18m, 
                    subset= helper=="POPNOAW0_r060", 
                    na.action = na.exclude, 
                    data = modelready)

OAW0 <- fastbw(olsOAW0.r060, rule="p", type="residual", sls= 0.05)

vec <- as.vector(OAW0$names.kept, mode="any")

b <- paste(vec, sep ="+") ##I even tried b <- paste(OAW0$names.kept, sep="+")

bestp.OAW0.r060 <- lm(roll_pct ~ b , 
                      data = modelready, 
                      subset = helper =="POPNOAW0_r060",    
                      na.action = na.exclude)

I am new to R and still haven't trailed the steep learning curve, so apologize for obvious programming blunders.

zx8754
  • 52,746
  • 12
  • 114
  • 209
Anand
  • 759
  • 1
  • 6
  • 10

6 Answers6

68

You're almost there. You just have to paste the entire formula together, something like this:

paste("roll_pct ~ ",b,sep = "")

coerce it to an actual formula using as.formula and then pass that to lm. Technically, I think lm may coerce a character string itself, but coercing it yourself is generally safer. (Some functions that expect formulas won't do the coercion for you, others will.)

joran
  • 169,992
  • 32
  • 429
  • 468
  • JJoran, thankyou, I still have error .please take a look hpi <- paste ("byoy", "change18m" , "change24m" , "change18m0", "change24m0", "lag3byoy" , "lag3change18m" , "lag3change24m", "lag6byoy", "lag6change18m", "lag6change24m", "trans_YoY" , sep ="+") hpi.form <- as.formula(paste("roll_pct~", "hpi", sep = "")) lmNGC0.r060 <- lm(hpi.form, subset= helper== "POPNOANGC0_r060",na.action = na.exclude, data = modelready) > Error in model.frame.default(formula = as.formula(paste("roll_pct~", "hpi",:variable lengths differ (found for 'hpi') – Anand Feb 13 '12 at 19:05
  • @user1199861 You put `hpi` in quotes in the second line. Type `hpi.form` at the console and you'll see why this is wrong. – joran Feb 13 '12 at 19:13
  • Joran, thanks once again, i think lm() is not coercing the object "hpi" into the formula as character string. When I tried typing the variables into the lm() formula, it worked – Anand Feb 13 '12 at 19:22
  • 1
    @user1199861 No, you pasted it together wrong. You wrote: `paste("roll_pct~", "hpi", sep = "")`, rather than `paste("roll_pct~", hpi, sep = "")` as I indicated in my answer. – joran Feb 13 '12 at 19:25
  • Joran, thanks, you spotted it . Got it work this time..... thank you for being patient with me – Anand Feb 13 '12 at 20:15
  • doesn't work with formula in stats::aggregate function – Peter.k Jul 17 '17 at 13:27
  • 1
    @Peter.k Yes, `aggregate` is one of the cases where you'd need to coerce it yourself via `as.formula`. – joran Jul 17 '17 at 14:24
  • I found you need to use `collapse` instead of `sep` in the first `paste` (for the vector) (as suggested in the somewhat overlooked answer by @cconnell) – James Feb 04 '20 at 21:58
31

You would actually need to use collapse instead of seb when defining b.

b <- paste(OAW0$names.kept, collapse="+")

Then you can put it in joran answer

paste("roll_pct ~ ",b,sep = "")

or just use:

paste("roll_pct ~ ",paste(OAW0$names.kept, collapse="+"),sep = "")
cconnell
  • 843
  • 1
  • 10
  • 14
13

I ran into similar issue today, if you want to make it even more generic where you don't even have to have fixed class name, you can use

frmla <- as.formula(paste(colnames(modelready)[1], paste(colnames(modelready)[2:ncol(modelready)], sep = "", 
                              collapse = " + "), sep = " ~ "))

This assumes that you have class variable or the dependent variable in the first column but indexing can be easily switched to last column as:

frmla <- as.formula(paste(colnames(modelready)[ncol(modelready)], paste(colnames(modelready)[1:(ncol(modelready)-1)], sep = "", 
                              collapse = " + "), sep = " ~ "))

Then continue with lm using:

bestp.OAW0.r060 <- lm(frmla , data = modelready, ... )
discipulus
  • 2,665
  • 3
  • 34
  • 51
  • 3
    This answer almost 3 years old but very simple and elegant - and saved me much time - upvote. – Hatt Jul 31 '18 at 20:23
3

If you're looking for something less verbose:

fm <- as.formula( paste( colnames(df)[i], ".", sep=" ~ ")) 
                                      # i is the index of the outcome column

Here it is in a function:

getFormula<-function(target, df) {

  i <- grep(target,colnames(df))
  as.formula(paste(colnames(df)[i], 
                   ".", 
                   sep = " ~ "))
}
fm <- getFormula("myOutcomeColumnName", myDataFrame)
rp <- rpart(fm, data = myDataFrame) # Use the formula to build a model
Travis Heeter
  • 13,002
  • 13
  • 87
  • 129
2

One trick that I use in similar situations is to subset your data and simply use e.g. lm(dep_var ~ ., data = your_data).

Example

data(mtcars)
ind_vars <- c("mpg", "cyl")
dep_var <- "hp"

temp_subset <- dplyr::select(mtcars, dep_var, ind_vars)

lm(hp ~., data = temp_subset)
JerryTheForester
  • 456
  • 1
  • 9
  • 26
0

just to simplify and collect above answers, based on a function

my_formula<- function(colPosition, trainSet){
    dep_part<- paste(colnames(trainSet)[colPosition],"~",sep=" ")
    ind_part<- paste(colnames(trainSet)[-colPosition],collapse=" + ")
    dt_formula<- as.formula(paste(dep_part,ind_part,sep=" "))
    return(dt_formula)
}

To use it:

my_formula( dependent_var_position, myTrainSet)
ameet chaubal
  • 1,440
  • 16
  • 37