0

I have a dataset named 'dat' with 5 columns: month; mean0; sd0; mean1; sd1. It looks like the following (but with numbers):

month mean0 sd0 mean1 sd1

1
2
3
..
48

I would like to use an independent (not paired) t-test to compare mean0 and mean1 for every month between 1 and 48. Ideally, the output would be put in another dataframe, called 'dat1', with columns for: t-statisitc, degrees of freedom (DF); and a p-value. Like so:

month t-statistic DF p-value
1
2
3
..
48

I have tried using dplyr and broom packages, but cannot seem to figure it out. Any help would be appreciated.

Karolis Koncevičius
  • 9,417
  • 9
  • 56
  • 89
eabc0351
  • 13
  • 1
  • Please share sample of your data using `dput()` (not `str` or `head` or picture/screenshot) so others can help. See more here https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example?rq=1 – Tung Sep 20 '18 at 02:15
  • Hi ,Welcome to stack overflow .Please take the time to read to see https://stackoverflow.com/help/how-to-ask, We cannot help you if you do not provide any code – core114 Sep 20 '18 at 05:51

1 Answers1

1

You'll need the n values for both sd's as well. The tsum.test function from the BSDA package will help you do the t-test without your having to write your own function.

There remains the larger question of the advisability of doing a large number of comparisons in this manner. This link provides information about that.

With that caveat, here's how to do what you want with some arbitrary data:

dat <- data.frame(m1=c(24,11,34),
                  sd1=c(1.3,4.2,2.3),
                  n1=c(30, 31, 30),
                  m2=c(18,8,22), 
                  sd2=c(1.8, 3.4, 1.8),
                  n2=c(30,31,30))

# user function to do t-test and return desired values
do.tsum <- function(x) {
    # tsum.test is quirky, so you have to break out each column's value
    results <- tsum.test(x[1],x[2],x[3],x[4],x[5],x[6],alternative='two.sided')
    return(c(results$statistic, results$parameters, results$p.value))
}

# use apply to do the tsum.test on each row (1 for rows, 2 for cols)
# then, transpose the resulting matrix and use the data.frame function
t.results <- data.frame(t(apply, 1, do.tsum))s

# unfortunately the p-value is returned without no column name (it returns 'm1')
# use the names function to change the third column name.
names(t.results)[3] <- 'p.value'

Output is as follows:

          t       df      p.value
1 14.800910 52.78253 1.982944e-20
2  3.091083 57.50678 3.072783e-03
3 22.504396 54.83298 2.277676e-29
Edward Carney
  • 1,372
  • 9
  • 7