0

I want to improve the way to insert predictors in a regression function:

fm <- lm(formula= df$dependent_variable ~ df[,2] + df[,3]+ df[,4], data = df)

df = data.frame

In this example I put only 4 predictors and 1 dependent_variable. Actually I have 191 predictors. I think I need to a loop script to put all these predictors. Suggestions?

Giffredo
  • 79
  • 2
  • 8
  • Looks like you need a loop. But, your question is not clear. If you need to run using a single formula, consider `lm(dependent_variable~., data=df), probably `paste` or `formula` is needed as well. – akrun Jul 17 '15 at 13:33
  • probably I was not clear, sorry. what I want is to get a formula equivalent to: lm(formula= df$dependent_variable ~ df[,2] + df[,3]+...+ df[,n], data = df) where n=191 I don't understand your point.. – Giffredo Jul 17 '15 at 13:49
  • Have you tried the one I commented, It should work i.e. `lm(dependent_variable~., data=df)` – akrun Jul 17 '15 at 13:50
  • Or use `reformulate` with the index of the predictors i.e. `lm(reformulate(names(df)[2:ncol(df)], response='dependent_variable'), df)` – akrun Jul 17 '15 at 13:53
  • Ok now I undertand.. Works both the systems even if the summary() doesn't give me the expected results. Almost all are NA. I have to check my dataframe.. – Giffredo Jul 17 '15 at 14:05
  • It must be related to your dataset or doesn't have enough degrees of freedom – akrun Jul 17 '15 at 14:06
  • I have a df of 96rows x 192col. The dependent variable is a column. I would like to have 191 predictors instead the formula gives me error saying that maximum the predictors are 96. Anyway in the summary put me the results observing the first 96 column like predictors. I actually don't understand the problem. – Giffredo Jul 20 '15 at 08:42
  • As I mentioned earlier, the code works on an example I created. So, it must be related to the degrees of freedom i.e you don't have enough number of observations to calculate. – akrun Jul 20 '15 at 08:49
  • But why the first 96 column yes and using 97 not anymore? the number of observations for each predictors are the same (96 observations). I have to deduce that my df should be a square 192x192 to work? – Giffredo Jul 20 '15 at 11:51

1 Answers1

1

Here is one possible solution:

yname<-"DVnamehere"
xnames<-colnames(dat)
xnames<-xnames[-which(xnames==yname)]
formula<-as.formula(paste(yname,"~",paste(xnames,collapse="+")))
model<-lm(formula,data=dat)
summary(model)

While this is not a loop it only requires you specify the name of the dependent variable, and uses the rest of the variables in the data set as the predictors then puts everything the regression formula. Does this help?

costebk08
  • 1,299
  • 4
  • 17
  • 42
  • Yes, this work also but give me the same problem than the others options. – Giffredo Jul 20 '15 at 08:47
  • Well looking at the dimensions of your dataframe, you definitely have too many predictors for two few participants. Though we are stepping away from a programming issue, the basic role of thumb is that you have at least 5 participants for every predictor. Thus, in your case 191x5=955 participants, and that is a very conservative estimate. In truth you probably want more than that. I would recommend either eliminating predictors, or conducting factor analysis to reduce the number of predictors. Also, if the answer helped please up vote/accept it. Thank you! – costebk08 Jul 20 '15 at 14:46
  • So if I have well understood, I have too much predictors comparing with the observations. So it is a limit of the regression method not a R programming problem. – Giffredo Jul 21 '15 at 10:07
  • That is correct, my guess is if you ran this data in any other stats software you would encounter a similar issue. Thank you for accepting the answer please upvote as well if it helped you. – costebk08 Jul 21 '15 at 14:07