Linear regression coefficient information as Data Frame or Matrix

Question

I am trying to create a script to optimize a linear regression analysis, and I would really like to operate on the model output, most specifically the Pr(>|t|) value. Unfortunately, I do not know how to get the model output into a matrix or data table.

Here is an example: In the code below, I create seven columns of data, and fit the seventh using the other six. When I get a summary of the model, it is clear that three of the parameters are much more significant than than the other three. If I had access to the coefficient output numerically, I could perhaps create a script to drop the least significant parameter and re-run the analysis... however as it is, I am doing this manually.

What is the best way to do this?

q = matrix( 
c(2,14,-4,1,10,9,41,8,13,2,0,20,3,27,1,10,-1,0,
10,-6,23,6,13,-8,1,15,-7,55,7,14,10,0,20,-3,6,4,20,
-1,5,19,-2,48,10,19,8,8,10,-2,24,8,13,9,8,14,5,7,7,
12,1,0,16,7,27,7,10,-1,1,15,7,31,2,20,-5,10,12,3,57,
0,19,-8,8,11,-4,63,5,11,7,8,10,-7,6,9,10,-7,2,19,8,
51,2,18,3,3,14,4,30), nrow=15, ncol=7, byrow = TRUE)
#
colnames(q) <- c("A","B","C","D","E","F","Z")
#
q <- as.data.frame(q)
#
qmodel <- lm(Z~.,data=q)
#
summary(qmodel)
#

Output:

Call:
lm(formula = Z ~ ., data = q)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.25098 -0.52655 -0.02931  0.62350  1.26649 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -2.09303    1.51627  -1.380    0.205    
A            0.91161    0.11719   7.779 5.34e-05 ***
B            1.99503    0.09539  20.914 2.87e-08 ***
C           -2.98252    0.04789 -62.283 4.91e-12 ***
D            0.13458    0.10377   1.297    0.231    
E            0.15191    0.09397   1.617    0.145    
F            0.01417    0.04716   0.300    0.772    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.9439 on 8 degrees of freedom
Multiple R-squared:  0.9986,    Adjusted R-squared:  0.9975 
F-statistic: 928.9 on 6 and 8 DF,  p-value: 6.317e-11

Now here is what I'd like to see:

 > coeffs
             Estimate Std. Error t value Pr(>|t|)
 (Intercept) -2.09303    1.51627  -1.380 2.05e-01
 A            0.91161    0.11719   7.779 5.34e-05
 B            1.99503    0.09539  20.914 2.87e-08
 C           -2.98252    0.04789 -62.283 4.91e-12
 D            0.13458    0.10377   1.297 2.31e-01
 E            0.15191    0.09397   1.617 1.45e-01
 F            0.01417    0.04716   0.300 7.72e-01

As it is, I got that in this manner... not automated at all...

coeffs = matrix(
c(-2.09303,1.51627,-1.38,0.205,0.91161,0.11719,
7.779,0.0000534,1.99503,0.09539,20.914,0.0000000287,
-2.98252,0.04789,-62.283,0.00000000000491,0.13458,
0.10377,1.297,0.231,0.15191,0.09397,1.617,0.145,
0.01417,0.04716,0.3,0.772), nrow=7, ncol=4, byrow = TRUE)
#
rownames(coeffs) <- c("(Intercept)","A","B","C","D","E","F")
colnames(coeffs) <- c("Estimate","Std. Error","t value","Pr(>|t|)")
#
coeffs <- as.data.frame(coeffs)
#
coeffs

Hong Ooi · Accepted Answer · 2014-08-18T23:06:23.313

8

What you want is the coefficients component of the summary object.

m <- lm(Z~.,data=q)

summary(m)$coefficients

Some further comments:

Use step to do stepwise variable selection rather than coding it yourself;
Stepwise variable selection has bad statistical properties; consider something like glmnet (in the package of the same name) to do regularized model building instead.

edited Aug 18 '14 at 23:06

answered Aug 18 '14 at 22:40

Hong Ooi

56,353
13
134
187

Hong, this is brilliant! Exactly what I was looking for... although now I am wondering if my approach is flawed. I was not aware that stepwise variable selection has bad statistical properties. Can you elaborate a bit? What sort of errors am I likely to encounter? – rucker Aug 19 '14 at 00:17
Basically, stepwise methods are prone to overfit your data, meaning they'll mistake noise for signal. The problem is worst when you have small datasets and lots of variables, but you still need to be careful even with big datasets. For more info check out CrossValidated, the statistics/machine learning StackExchange. http://stats.stackexchange.com/questions/tagged/stepwise-regression – Hong Ooi Aug 19 '14 at 01:38

Barranka · Answer 2 · 2014-08-18T23:20:46.697

3

If I understand correctly, you need the matrix returned by the summary. That's pretty straight forward:

fit <- lm( formula, data=yourData)
coeffs <- summary(fit)$coefficients

After that, you can select the records from coeffs that match your conditions, just like with any matrix. Example:

coeffs[coeffs[4,] < 1e-12,]

edited Aug 18 '14 at 23:20

answered Aug 18 '14 at 22:47

Barranka

20,547
13
65
83

Linear regression coefficient information as Data Frame or Matrix

2 Answers2

Linked