2

Let's say I want to fit a set of X's to a Y. For example:

fit <- glm(Y~x1+x2+x3,data=mydata,family=binomial())

Instead of using variable names X1, X2, X3, I'd like to fit Y using the 31 features in positions/columns 20 to 50 from mydata. I've seen this done before but can't find the example. Can anyone please provide an example ... thanks.

BobL
  • 19
  • 6
  • Another method would be to construct the formula object of the first argument manually, e.g. `as.formula( c( "Y ~ " , paste( names(mydata)[20:50] , collapse = " + " ) ) )` this however gets complicated if you want interactions and quadratic forms for some of your covariates. However [**this brilliant answer**](http://stackoverflow.com/a/3594093/1478381) by Joris Meyes covers that. – Simon O'Hanlon Jun 12 '13 at 21:45

3 Answers3

4

For different reasons I am not excited about either of the current answers.

lm( Y ~ . , data= mydata[ , c( grep("^Y$", names(mydata) ), 20:50) ] )

(Numeric column indexing using grep to look up the correct position of Y. Means there will not be any confusion arising from Y coming in as separate from 'mydata'.)

eddi
  • 49,088
  • 6
  • 104
  • 155
IRTFM
  • 258,963
  • 21
  • 364
  • 487
2
y <- 1:100
mydata <- as.data.frame(matrix(rnorm(10000),nrow=100))
lm(y~.,data=mydata) # all the variables
lm(y~.,data=mydata[,20:30]) # just some of them
Thomas
  • 43,637
  • 12
  • 109
  • 140
  • the only problem with this one is that it separates the response from the predictors. Things usually work better if you can keep them all in a common data frame ... – Ben Bolker Jun 12 '13 at 21:37
1

This may not be the most elegant way to do it, but here is how I do it with lm(). You of course would just change theFormula string to match your use of glm().

myData <- data.frame(Y=rnorm(100),x1=rnorm(100),x2=rnorm(100))
theNames <- names(myData)[2:3]
theFormula <- paste0("lm(Y ~ ",paste(theNames,collapse=" + "),", data=myData)")
theModel <- eval(parse(text=theFormula))
Andrew Barr
  • 3,589
  • 4
  • 18
  • 28
  • 4
    you don't need `eval(parse())`. You can just do `lm(reformulate(theNames,response="Y")),data=myData)` – Ben Bolker Jun 12 '13 at 21:36
  • @BenBolker do you know if it is possible to use `reformulate`, possibly with `update.formula` to include covariates with a quadratic terms? – Simon O'Hanlon Jun 12 '13 at 21:52
  • how about `f <- reformulate(...); f2 <- update(f,.~.^2)` ? (I haven't tried it) (this gives *all* second-order interactions, but doesn't give interactions between continuous predictors: I might wimp out and do `reformulate(paste0("poly(",paste(theNames,collapse=","),",degree=2)",response="y")` – Ben Bolker Jun 12 '13 at 21:54
  • Re: "... but doesn't give interactions between continuous predictors": I use rms/Hmisc and crossed-`rcs` terms to get continuous-by-continuous interactions. The integration with a lattice plotting environment makes examination of the conditional effects very revealing. – IRTFM Jun 12 '13 at 22:17
  • it occurs to me that it would be nice if `y ~ poly(.,degree=2)` just worked, but it doesn't seem to ... – Ben Bolker Jun 13 '13 at 01:39