2

It appears simple, but I don't know how to code it in R. I have a dataframe (df) with ~100 variables, and I would like to do a multiple regression between the response which is my First variable (Y) and the variables 25 to 60 as regressors. The problem is that I don't want to write each variable name like:

lm(Y~var25+var26+.......var60, data=df)

I would like to use something like [, 25:60] to select a complete range. I have tried it but doesn't works:

test <- lm(Y~df[, 25:60], data=df)
summary(test)

some idea?

Cettt
  • 11,460
  • 7
  • 35
  • 58
Darwin PC
  • 871
  • 3
  • 23
  • 34

1 Answers1

8

You could subset the dataset by selecting only those columns, and then do the lm.

lm(Y~., data=df1[c(1,25:60)])

Suppose, if you need var25 to var60 and if the data is ordered by column names

lm(Y~., data=df1[c(1,26:61)])   

Or another option would be to use paste to create the formula

lm(paste("Y ~", paste(paste0('var', 25:60), collapse="+")), data=df1)

data

set.seed(24)
df1 <- as.data.frame(matrix(sample(1:80, 20*101, replace=TRUE),
   ncol=101, dimnames=list(NULL, c('Y', paste0('var', 1:100)))))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • I think the `subset=` only applies to cases, not columns. – thelatemail Feb 15 '15 at 06:02
  • @thelatemail: Not so: `subset(df1, select=c(1,26:60))` ... `subset` has both a subset(rows) and a select(columns) option. – IRTFM Feb 15 '15 at 06:56
  • @BondedDust - I think you're confusing `?subset` with `subset=` in `?lm` – thelatemail Feb 15 '15 at 22:05
  • I may be misconstruing the point of your comment (which at the moment doesn't match up with anything in the question), but I don't believe I am confusing the arguments of the `subset` function. – IRTFM Feb 15 '15 at 22:32