6

I know that if I have a set of data, I can run t.test to do a T test. But I only know the count, mean and standard deviation for each set. I'm sure there must be a way to do this in R, but I can't figure it out. Any help?

C8H10N4O2
  • 18,312
  • 8
  • 98
  • 134
Xodarap
  • 11,581
  • 11
  • 56
  • 94

3 Answers3

14

Using the formula for t-tests with unequal variance and unequal sample sizes. Note that this is for an unpaired t-test.

t.test.fromSummaryStats <- function(mu,n,s) {
   -diff(mu) / sqrt( sum( s^2/n ) )
}

mu <- c(.1,.136)
n <- c(5,7)
s <- c(.01,.02)
t.test.fromSummaryStats(mu,n,s)
Ari B. Friedman
  • 71,271
  • 35
  • 175
  • 235
  • 1
    **Please** check this for accuracy before using it for anything important! – Ari B. Friedman Apr 03 '11 at 01:36
  • Hmm, that certainly is one way to do it. I have to do a Welch's T-Test, which is somewhat more difficult, but yeah I guess I could do it myself. – Xodarap Apr 03 '11 at 01:54
  • There may be a function for it in one of the CRAN packages, but it's a simple enough calculation that it shouldn't be too bad to write your own function for (of course, there are benefits to using pre-existing functions if it has been written already: it's tested, and your solution transfers more easily). – Ari B. Friedman Apr 03 '11 at 02:11
10

You can certainly calculate the formula by hand or simulate. But if you want a quick function call, there is ?tsum.test in the BSDA package. For example, this makes the Welch t-test quite easy. Using @AriB.Friedman's numbers:

library(BSDA)
tsum.test(mean.x=.1,   s.x=.01, n.x=5,
          mean.y=.136, s.y=.02, n.y=7)
# 
#         Welch Modified Two-Sample t-Test
# 
# data:  Summarized x and y
# t = -4.0988, df = 9.238, p-value = 0.002538
# alternative hypothesis: true difference in means is not equal to 0
# 95 percent confidence interval:
#  -0.05579113 -0.01620887
# sample estimates:
# mean of x mean of y 
#     0.100     0.136
gung - Reinstate Monica
  • 11,583
  • 7
  • 60
  • 79
6

If you don't want to recode the formula yourself, you can always simulate data set that has the exact summaries that you have, then analyse the simulated data. The mvrnorm function in the MASS package can be used to generate normal data with a given mean and variance (set the empirical argument to TRUE).

Greg Snow
  • 48,497
  • 6
  • 83
  • 110