-2

I have a vector:

X <-c("A","B","C","D") # and Y is "Y"
Z <-c("R","T","G","U","I") 
XY <- matrix(1:150, ncol = 10)
colnames(XY)<- c("Y", X, Z)

The objective is to do:

for (i in 1: length(X)){
     for (j in 1:length(X)){
          lm(Y~X[i]+X[j], data=XY)
     }
}

the problem is: X[1] = "A" and lm(Y~"A", data=XY) won't read.

cat(X) and factor (X) do not work. cat(X[1]) or factor(X[1]) neither.

4castle
  • 32,613
  • 11
  • 69
  • 106
xav
  • 111
  • 6
  • You mean *quotes* right? You can't remove them, they denote a character variable, but they are not really *there*. Try `cat( X )` – Simon O'Hanlon Oct 29 '13 at 22:50
  • also `factor` can be useful, depending on what is your aim. – Jilber Urbina Oct 29 '13 at 22:51
  • Yes, quotes. Thanks, but cat(X) does not work for my purpose because X[1] is "A", and i need X[1] to be A. – xav Oct 29 '13 at 22:53
  • However, lm(Y~cat(X[1]), data=data.frame(XY)) does not work. And lm(Y~factor(X[1]), data=data.frame(XY)). Where X and Y are colnames(XY) – xav Oct 29 '13 at 22:57
  • Ummmm. use `lm( Y ~ A , data = X )`, where `X` is the data.frame that contains your variable name. Read the manual. – Simon O'Hanlon Oct 29 '13 at 22:59
  • Is `lm(Y ~ ., data = X[, c(1,2)])` what you want? or `lm(Y ~ ., data = yourdata[ , X[c(1,2)]])` – Dason Oct 29 '13 at 23:04
  • lm( Y ~ A , data = X ) is not a solution to iterate over elements of X, which are colnames(XY). – xav Oct 29 '13 at 23:04
  • Thnak you, but no. I want to iterate elements of X in a regression. – xav Oct 29 '13 at 23:05
  • 2
    What does that mean? Why don't you try to describe the problem you're actually trying to solve. Providing a reproducible example would really help out and I'm wondering why we having asked you to provide one yet... – Dason Oct 29 '13 at 23:06
  • 2
    @xav Do you actually have a data.frame that has columns called `Y` and whatever you want on the RHS of your formula with values for each observation? Please also read [**how to make a great reproducible example**](http://stackoverflow.com/q/5963269/1478381) and update your question accordingly! – Simon O'Hanlon Oct 29 '13 at 23:07

1 Answers1

2

In R, a formula is a symbolic representation of a model. You can create formulas from character strings, but you cannot mix symbols and character strings. For example, you could do:

lm(Y~X+Z,data = XY)

or you could do something like:

f <- as.formula(paste0("Y~",paste("X","Z",sep = "+")))
lm(formula = f,data = XY)

In your case, that means you probably need to build the formula manually each time like this:

for (i in 1: length(X)){
     for (j in 1:length(X)){
          f <- as.formula(paste0("Y~",paste(X[i],X[j],sep = "+")))
          lm(formula = f, data=XY)
     }
}

But then, this example makes little sense, since why only use the variable A-D? Why are you not looping through the other variables R, T, G, etc.? Presumably the intent was to fit models with all combinations of two covariates? Like I said, this example is rather confusing.

More generally, fitting models in the fashion is a terrible idea, and you should not do it at all. Anything you learn by fitting linear regression models one by one using every possible pair of covariates will just as likely be statistical noise as it will be signal. Not to mention the fact that as you've set this up, you will be fitting some models using the same variable twice (when i = j) in which case you will have two perfectly co-linear variables.

joran
  • 169,992
  • 32
  • 429
  • 468