56

My apologies for the dumb question...but I can't seem to find a simple solution

I want to extract the standardized coefficients from a fitted linear model (in R) there must be a simple way or function that does that. can you tell me what is it?

EDIT (following some of the comments below): I should have probably provided more contextual information about my question. I was teaching an introductory R workshop for a bunch of psychologists. For them, a linear model without the ability to get standardized coefficients is as if you didn't run the model at all (ok, this is a bit of an exaggeration, but you get the point). When we've done some regressions this was their first question, which (my bad) I didn't anticipate (I'm not a psychologist). Of course I can program this myself, and of course I can look for packages that do it for me. But at the same time, I do think that this is kind of a basic and common required feature of linear models, that on the spot, I thought there should be a basic function that does it without a need to install more and more packages (which is perceived as a difficulty for beginners). So I asked (and this was also an opportunity to show them how to get help when they need it).

My apologies for those who think I asked a stupid question, and my many thanks for those who took the time to answer it.

amit
  • 3,332
  • 6
  • 24
  • 32
  • 1
    try this function `stdcoeff <- function (MOD) {b <- summary(MOD)$coef[-1, 1] ; sx <- sd(MOD$model[-1]); sy <- sd(MOD$model[1]); beta <- b * sx/sy ; return(beta) }` where `MOD` is your model produced by `lm` function, so you''ll use it as `stdcoeff(lm(...))` (I didn't write it, just found on the net, so not posting as an answer) – David Arenburg Jun 19 '14 at 11:21
  • 2
    I liked [this approach](http://www.stat.columbia.edu/~gelman/research/published/standardizing7.pdf). It rescales the input variables by two times the standard deviation for easy interpretation. Its `standardize`function can be found in the `arm` package. – horseoftheyear Jun 19 '14 at 12:08
  • 3
    @CarlWitthoft, where do you see the solution to this question in the help files? Whether @DavidArenburg's comment or the `QuantPsyc::lm.beta` answer below is correct depends on what the OP means by "standardized" (which they didn't specify) – Ben Bolker Jun 19 '14 at 12:12
  • The help files clearly locate the coefficients `lm` generates. Anyone who's qualified to do statistical analysis should know the equations which relate the various values `lm` returns to whatever parameters they wish to calculate. I'm not just being snooty here: people who blindly trust some computer program to do everything for them will never learn to evaluate the quality (accuracy, etc) of the answer they get. It's like using `confint` in Excel without ever knowing that it's calculated at the 95% level. – Carl Witthoft Jun 19 '14 at 13:09
  • 1
    oops, I was actually wrong about "several definitions" -- sorry. I read @DavidArenburg's comment too quickly (and didn't remember that `lm` stored the model frame there); his comment is the same as the internal code in `QuantPsyc::lm.beta` – Ben Bolker Jun 19 '14 at 13:55
  • 1
    whut evvarrr.. sorry for being a grouch here. – Carl Witthoft Jun 19 '14 at 15:22
  • See my edit in the body of the question for some responses to the comments above. many thanks to those who contributed. – amit Jun 20 '14 at 15:33

3 Answers3

81

There is a convenience function in the QuantPsyc package for that, called lm.beta. However, I think the easiest way is to just standardize your variables. The coefficients will then automatically be the standardized "beta"-coefficients (i.e. coefficients in terms of standard deviations).

For instance,

 lm(scale(your.y) ~ scale(your.x), data=your.Data)

will give you the standardized coefficient.

Are they really the same? The following illustrates that both are identical:

library("QuantPsyc")
mod <- lm(weight ~ height, data=women)
coef_lmbeta <- lm.beta(mod)

coef_lmbeta
> height 
  0.9955 

mod2 <- lm(scale(weight) ~ scale(height), data=women)
coef_scale <- coef(mod2)[2]

coef_scale
> scale(height) 
  0.9955 

all.equal(coef_lmbeta, coef_scale, check.attributes=F)
[1] TRUE

which shows that both are identical, as they should be.

How to avoid clumsy variable names? In case you don't want to deal with these clumsy variable names such as scale(height), one option is to standardize the variables outside the lm call in the dataset itself. For instance,

women2 <- lapply(women, scale) # standardizes all variables

mod3 <- lm(weight ~ height, data=women2)
coef_alt <- coef(mod3)[2]
coef_alt
> height 
  0.9955 

all.equal(coef_lmbeta, coef_alt)
[1] TRUE

How do I standardize multiple variables conveniently? In the likely event that you don't want to standardize all variables in your dataset, you could pick out all that occur in your formula. For instance, referring to the mtcars-dataset now (since women only contains height and weight):

Say the following is the regression model I want to estimate:

 modelformula <- mpg ~ cyl + disp + hp + drat + qsec

We can use the fact that all.vars gives me a vector of the variable names.

 all.vars(modelformula)
 [1] "mpg"  "cyl"  "disp" "hp"   "drat" "qsec"

We can use this to subset the dataset accordingly. For instance,

mycars <- lapply(mtcars[, all.vars(modelformula)], scale) 

will give me a dataset in which all variables have been standardized. Linear regressions using mycars will now give standardized betas. Please make sure that standardizing all these variables makes sense, though!

Potential issue with only one variable: In case you model formula only contains one explanatory variable and you are working with the built-in dataframes (and not with tibbles), the following adjustment is advisable (credits go to @JerryT in the comments):

mycars <- lapply(mtcars[, all.vars(modelformula), drop=F], scale) 

This is because when you extract only one column from a standard data frame, R retuns a vector instead of a dataframe. drop=F will prevent this from happening. This also won't be a problem if e.g. tibbles are used. See e.g.

class(mtcars[, "mpg"])
[1] "numeric"
class(mtcars[, "mpg", drop=F])
[1] "data.frame"
library(tidyverse)
class(as.tibble(mtcars)[, "mpg"])
[1] "tbl_df"     "tbl"        "data.frame"

Another issue with missing values in the dataframe (credits go again to @JerryT in the comments): By default, R's lm removes all rows where at least one column is missing. scale, on the other hand, would take all values that are non-missing, even if an observation has a missing value in a different column. If you want to mimick the action of lm, you may want to first drop all rows with missing values, like so:

all_complete <- complete.cases(df)
df[all_complete,]
coffeinjunky
  • 11,254
  • 39
  • 57
  • 2
    note that this is more or less what `arm::standardize` does (although it offers some flexibility as to whether the response is scaled or not, and does some fancier/non-standard stuff with dummies for categorical predictors). The advantage of the `lm.beta` approach is that it doesn't require re-fitting the model. – Ben Bolker Jun 20 '14 at 16:08
  • 3
    The `dplyr` alternative to standardizing multiple variables would be: `mycars <- mtcars %>% mutate_each_(funs(scale), all.vars(model.formula))`, I believe. – Jake Fisher Sep 28 '16 at 18:18
  • 1
    make sure you drop na (if there is missing) before scale also, lapply(mtcars[, all.vars(modelformula), drop=F], scale) in case there is only one variable in modelformula There is also lm.beta package which does the same – Jerry T Jan 18 '19 at 02:15
  • @JerryT I believe `scale` can handle missing values, but that is a very good point about `drop=F`. I will include it in my answer to warn people about it! – coffeinjunky Jan 18 '19 at 08:09
  • 1
    @coffeinjunky You are such an excellent teacher here and explain things very well. I think lapply applies scale columwise, so for example, cyl has NA in row 1, disp has NA in row 2. When one do lm, both row 1 and 2 will be removed due to NA, but scale will only remove row 1 for cyl, row 2 for disp. So I think it is better to do drop na before scale, to be consistent with lm's na.action (assuming na.exclude or na.omit) – Jerry T Jan 18 '19 at 16:33
  • 1
    @JerryT Fair point. Hadn't thought about that! Thanks for the explanation. Will update in a short while! – coffeinjunky Jan 18 '19 at 17:18
  • 1
    The object `mycars` is defined in that line, i.e. `mycars[] <- ...` will fail unless you have instantiated it before, and `lm` will work just fine with the list. If you want to keep it as a dataframe, you can always wrap it up in `data.frame()`. In general though, there is a lot to be said about proper workflow and handling of R objects. I really wish StackOverflow answers would be intended as tutorials for new beginners. – coffeinjunky Jan 19 '19 at 11:36
7

Package lm.beta has several functions to work with standardised coefficients, including lm.beta() which requires an lm object:

res <- lm(y~x)
lm.beta(res) 
luchonacho
  • 6,759
  • 4
  • 35
  • 52
0

Just use colnames(data) with lapply or sapply.
For example:

lapply(data[, colnames(data)], scale)
eli-k
  • 10,898
  • 11
  • 40
  • 44