1

I am running into an issue in R and not quite sure what it happening. When I run a regression and a t.test on the same variables I find that the t.test is dropping ~100 participants (the DF is 283.93 for the t-test and 382 for the regression), giving me different pvalues. However, if I compute the means separately for the full sample, they are the same as showing in the t-test.

Can anyone explain what might be happening? Below is the code and output for both the regression and the t-test. Note that the DV is a 1 to 7 variable and the IV is a 1/0 dummy.

The regression output

Call:
lm(formula = confident ~ get.surgery, data = d)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.2989 -0.7767  0.2233  0.7011  1.7011 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  5.29893    0.07714  68.692  < 2e-16 ***
get.surgery  0.47777    0.14895   3.208  0.00145 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.293 on 382 degrees of freedom
Multiple R-squared:  0.02623,   Adjusted R-squared:  0.02368 
F-statistic: 10.29 on 1 and 382 DF,  p-value: 0.001451

and the t-test

t.test(confident ~ get.surgery, data = d)

Welch Two Sample t-test

data:  confident by get.surgery
t = -3.6106, df = 233.93, p-value = 0.0003737
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.7384624 -0.2170709
sample estimates:
mean in group 0 mean in group 1 
       5.298932        5.776699 
lmo
  • 37,904
  • 9
  • 56
  • 69
  • 1
    To make this a programming question, you should really provide a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) . But it really seems more like a statistics question in understanding how the degrees of freedom are calculated for each test. That's a question that probably better belongs on [stats.se] where statistical questions are on-topic. – MrFlick Jan 04 '17 at 21:14
  • 1
    I think this might be a pooled variance issue. Try your t.test with var.equal = TRUE (default is FALSE) to see if that helps. – Joy Jan 04 '17 at 21:23
  • @Joy - that solved it! Thank you so much! – Allie Lieberman Jan 04 '17 at 21:49
  • Yay! I'll post it as an answer and you can select it as "right" then. – Joy Jan 04 '17 at 21:49
  • perfect thanks! I am curious if you know why R would drop observations in this case? I am not sure why 100 observations would be dropped when assuming unequal variance? Thanks so much again for your help! – Allie Lieberman Jan 05 '17 at 00:55

1 Answers1

1

I think this might be a pooled variance issue. Try your t.test with

var.equal = TRUE

(default is FALSE) to see if that helps.

Joy
  • 769
  • 6
  • 24