2

I have a list of 100 columns in a data frame Data1. One of these variables is the dependent variable. The others are predictors.

I need to extract 99 predictors into a column (say varlist) to be used in the equation below

equation <- as.formula(paste('y', "~", paste(varlist, collapse="+"),collapse=""))

I can use dput on the dataframe to extract all the columns but I could not get rid of the dependent variable y from the list:

Varlist <- dput(names(Data1)) 
Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
Dee
  • 49
  • 2
  • 5
  • possible duplicate of [how to succinctly write a formula with many variables from a data frame?](http://stackoverflow.com/questions/5251507/how-to-succinctly-write-a-formula-with-many-variables-from-a-data-frame) – Gavin Simpson Jun 13 '12 at 08:53

1 Answers1

7

It would be much more appropriate to go a different route. If you want to include all of the other variables in your data frame besides the response variable you can just use y ~ . to specify that.

fakedata <- as.data.frame(matrix(rnorm(100000), ncol = 100))
names(fakedata)[1] <- "y"

o <- lm(y ~ ., data = fakedata)

This fit a regression using the 99 other columns in fakedata as the predictors and 'y' as the response and stored it into 'o'


Edit: If you want to exclude some variables you can exclude those from the data set. The following removes the 10th column through the 100th column leaving a regression of y on columns 2-9

o <- lm(y ~ ., data = fakedata[,-(10:100)])
Dason
  • 60,663
  • 9
  • 131
  • 148