19

Instead of something like lm(bp~height+age, data=mydata) I would like to specify the columns by number, not name.

I tried lm(mydata[[1]]~mydata[[2]]+mydata[[3]]) but the problem with this is that, in the fitted model, the coefficients are named mydata[[2]], mydata[[3]] etc, whereas I would like them to have the real column names.

Perhaps this is a case of not having your cake and eating it, but if the experts could advise whether this is possible I would be grateful

Jim G.
  • 15,141
  • 22
  • 103
  • 166
LeelaSella
  • 757
  • 3
  • 13
  • 24
  • 1
    You might get better answers if you give a slightly larger context for what you're trying to do: "what is the problem you are trying to solve"? – Ben Bolker Oct 12 '11 at 15:23
  • Thanks for your comment. I have a large number of columns in a dataframe. I am fitting a linear model using a subset of these, using various techniques including stepwise selection. It will be convenient if I can refer to the columns by number when calling lm() but if possible I would like the model to show the column names. – LeelaSella Oct 12 '11 at 15:28
  • 1
    I would paste together a formula based on the names, as in: http://stackoverflow.com/questions/6877534/understanding-lm-and-environment/6878461#6878461 – Ben Bolker Oct 12 '11 at 15:30
  • If you come up with a good solution you're allowed (encouraged) to post an answer to your own question ... – Ben Bolker Oct 12 '11 at 15:52

2 Answers2

34
lm(
    as.formula(paste(colnames(mydata)[1], "~",
        paste(colnames(mydata)[c(2, 3)], collapse = "+"),
        sep = ""
    )),
    data=mydata
)

Instead of c(2, 3) you can use how many indices you want (no need for for loop).

Tomas
  • 57,621
  • 49
  • 238
  • 373
  • 1
    missing a comma at the end of the third line? – Ben Bolker Oct 12 '11 at 16:32
  • Thanks, @Ben. Also, maybe using `as.formula` would be more robust, though not needed for `lm()` (but for other models do). – Tomas Oct 12 '11 at 16:45
  • Thank you. This spells out what Ben Bolker suggested earlier, and works perfectly. – LeelaSella Oct 12 '11 at 21:09
  • To make this completely foolproof I needed to add backticks around the column names because of special characters in the names: `paste('\`', colnames(mydata)[c(2,3)], '\`', sep = "", collapse = "+")` – Evertvdw Jan 10 '18 at 09:30
2
lm(mydata[,1] ~ ., mydata[-1])

The trick that I found in a course on R is to remove the response column, otherwise you get warning "essentially perfect fit: summary may be unreliable". I do not know why it works, it does not follow from documentation. Normally, we keep the response column in.

And a simplified version of the earlier answer by Tomas:

lm(
    as.formula(paste(colnames(mydata)[1], "~ .")),
    data=mydata
)