0

I am looking for a function that I can input the independent variable and dependent variable and then it will return both the regression summary and a 5 number summary for each of my independent variables. Here is an example and my set up:

attach(iris)
five_num=matrix(0,nrow=3,ncol=6)
rownames(five_num)=c('Sepal.Width','Petal.Length','Petal.Width')
colnames(five_num)=c('Min','1st Qu','Median','Mean','3rd Qu','Max')
for (i in 1:3){
  five_num[i,]=summary(eval(parse(text=rownames(five_num)[i])))
}

Then I just print the regression and 5 number summaries:

summary(lm(Sepal.Length~Sepal.Width+Petal.Length+Petal.Width,data=iris))
Call:
lm(formula = Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width, 
    data = iris)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.82816 -0.21989  0.01875  0.19709  0.84570 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)   1.85600    0.25078   7.401 9.85e-12 ***
Sepal.Width   0.65084    0.06665   9.765  < 2e-16 ***
Petal.Length  0.70913    0.05672  12.502  < 2e-16 ***
Petal.Width  -0.55648    0.12755  -4.363 2.41e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.3145 on 146 degrees of freedom
Multiple R-squared:  0.8586,    Adjusted R-squared:  0.8557 
F-statistic: 295.5 on 3 and 146 DF,  p-value: < 2.2e-16



five_num
 Min 1st Qu Median  Mean 3rd Qu Max
Sepal.Width  2.0    2.8   3.00 3.057    3.3 4.4
Petal.Length 1.0    1.6   4.35 3.758    5.1 6.9
Petal.Width  0.1    0.3   1.30 1.199    1.8 2.5

I would like to make a function that looks like this and would return the same thing:

reg_5_num=function(dependent,independent){
code here
}

The main issue I run into is when I label my independent variables I cannot run them into a regression, because it needs plus signs to work.

In addition, I would like the function to also be able to use interaction term. If my regression is

summary(lm(Sepal.Length~Sepal.Width:Petal.Length+Petal.Width,data=iris))

Call:
lm(formula = Sepal.Length ~ Sepal.Width:Petal.Length + Petal.Width, 
    data = iris)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.80414 -0.24478 -0.02936  0.25741  0.94391 

Coefficients:
                         Estimate Std. Error t value Pr(>|t|)    
(Intercept)               4.14976    0.07771  53.397  < 2e-16 ***
Petal.Width              -0.31056    0.11365  -2.733  0.00705 ** 
Sepal.Width:Petal.Length  0.18510    0.01654  11.191  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.3524 on 147 degrees of freedom
Multiple R-squared:  0.8213,    Adjusted R-squared:  0.8189 
F-statistic: 337.8 on 2 and 147 DF,  p-value: < 2.2e-16

I would still like to see the same five_num.

Bobe Kryant
  • 2,050
  • 4
  • 19
  • 32
  • `as.formula(paste(dependent,paste(independent,collapse="+"),sep="~"))`, you can mess around with this to also include interaction terms. – slamballais Feb 20 '16 at 10:06
  • @Laterow thanks for your response, although I do not see how I can include interaction terms. – Bobe Kryant Feb 20 '16 at 10:16
  • Ok, here's the general breakdown. `lm` accepts objects of the class `formula`. You can create a `formula` by taking a string and using the `as.formula` function. So, this should work: `x=rnorm(10); y=rnorm(10); lm(as.formula("y ~ x"))`. So, the goal is to make a string that fits your formula. I use `paste` to create such a string. Look into the `collapse` and `sep` arguments and what they do. There are also other ways to generate formulas, try google for that. – slamballais Feb 20 '16 at 10:19
  • Possible duplicate of [how to use a character string in formula](http://stackoverflow.com/questions/17024685/how-to-use-a-character-string-in-formula) – slamballais Feb 20 '16 at 10:22

1 Answers1

0

Setting my independent and dependent variables as inputs, I use a few commands in my function to return the desired output. independent='Sepal.Width:Petal.Length+Petal.Width' dependent='Sepal.Length'

regs=function(independent,dependent) {
summary(lm(as.formula(paste(c(dependent,independent),collapse='~'))))
vars=unlist(strsplit(independent, "[:+]"))
five_num=matrix(0,nrow=length(vars),ncol=6)
rownames(five_num)=vars
colnames(five_num)=c('Min','1st Qu','Median','Mean','3rd Qu','Max')
for (i in 1:length(vars)){
  five_num[i,]=summary(eval(parse(text=vars[i])))
}
print(summary(lm(as.formula(paste(c(dependent,independent),collapse='~')))))
print(five_num)
}

This then returns:

regs(independent,dependent)

Call:
lm(formula = as.formula(paste(c(dependent, independent), collapse = "~")))

Residuals:
     Min       1Q   Median       3Q      Max 
-0.80414 -0.24478 -0.02936  0.25741  0.94391 

Coefficients:
                         Estimate Std. Error t value Pr(>|t|)    
(Intercept)               4.14976    0.07771  53.397  < 2e-16 ***
Petal.Width              -0.31056    0.11365  -2.733  0.00705 ** 
Sepal.Width:Petal.Length  0.18510    0.01654  11.191  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.3524 on 147 degrees of freedom
Multiple R-squared:  0.8213,    Adjusted R-squared:  0.8189 
F-statistic: 337.8 on 2 and 147 DF,  p-value: < 2.2e-16

             Min 1st Qu Median  Mean 3rd Qu Max
Sepal.Width  2.0    2.8   3.00 3.057    3.3 4.4
Petal.Length 1.0    1.6   4.35 3.758    5.1 6.9
Petal.Width  0.1    0.3   1.30 1.199    1.8 2.5
Bobe Kryant
  • 2,050
  • 4
  • 19
  • 32