the ways to write lm or glm formula when having a lot of independent variables

Question

I am reading a data set as follows:

data<-read.csv("test.csv",sep=",",header=T)

the first column of test.csv is the response variable. The remaining 20 columns are predictor variables. How can I write the lm formula for this kind of scenario. It kind of not a correct approach to write formula as

modelfit<-lm(data[,1]~data[,2]+data[,3],+... )

You can use `lm(y~., data=mydata)` to regress the column `y` in `mydata` against all other columns in `mydata`. If you're going to use the formula syntax, I would stay away from indexing (`[ ]') in the formula. — MrFlick, Nov 08 '14 at 05:35

sayan dasgupta · Accepted Answer · 2014-11-08T05:29:54.287

1

This is how you should do it

data<-read.csv("test.csv",sep=",",header=T)
variables <- colnames(data)
depVar <- variables[1]
indepVars <- variables[-1]
myformulae <- as.formula(paste(depVar,paste(indepVars,collapse=' + '),sep = ' ~ '))
modelfit <-lm(myformulae,data=data)

edited Nov 08 '14 at 05:29

answered Nov 08 '14 at 05:16

sayan dasgupta

1,084
6
15

the ways to write lm or glm formula when having a lot of independent variables

1 Answers1