How to iterate a t-test in R with data that has more than two arguments or categorical variables?

Question

I have dataset with comprises observations on a field, at 2 differents years (year1, year2), which compares 3 treatments (A, B, C). A long list of identical variables (var1, var2, var3... var20) was measured for each treatment for both years.

example of data structure

I would like to learn how to iterate a t-test which will evaluate the the difference between the means between year1 and year2, for each treatment and each variable, one-by-one.

I could not find how to pair treatments: 't-test for var1 for treatment A in year1 with treatment A in year2'; 't-test for var1 B in year1 with B in year2'; etc.

It would be ideal if the p-values of the t-test could be aggregate in a table, to simplify the export of the result.

Thanks!!

Please read [(1)](https://stackoverflow.com/help/how-to-ask) how do I ask a good question, [(2)](https://stackoverflow.com/help/mcve) how to create a MCVE as well as [(3)](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example#answer-5963610) how to provide a minimal reproducible example in R. Then edit and improve your question accordingly. I.e., abstract from your real problem... — Christoph, Mar 26 '20 at 16:30
So, dput a few lines of your data and relevant columns. Indexing. year1_A_idx <- which(my_data_frame$Year == 1 & my_data_frame$Treatment == A & others as necessary). Then call by index my_data_frame$some-variable[year1_A_idx, (and the columns you want)]. & for 'and', | for 'or'. HTH — Chris, Mar 26 '20 at 17:41

score 0 · Answer 1 · answered Mar 26 '20 at 18:15

Let's say your data is a data.frame called df, like below:

df = data.frame(year=sample(1:2,100,replace=TRUE),
treatment=sample(LETTERS[1:3],100,replace=TRUE),
var1=rnorm(100),var2=rnorm(100))

df$year = factor(df$year)

One option is to use genefilter, that does a t.test for every column, and you first split by the treatment and apply this function for every subset:

library(genefilter)
result = by(df,df$treatment,function(x){
test = colttests(x=as.matrix(x[,c("var1","var2")]),fac=x$year)
data.frame(
treatment = unique(x$treatment),
var = rownames(test),
pvalue=test$p.value
)
})

head(result)
    treatment  var     pvalue
A.1         A var1 0.60966026
A.2         A var2 0.70644999
B.1         B var1 0.29378349
B.2         B var2 0.63731701
C.1         C var1 0.91665543
C.2         C var2 0.06207974

Otherwise, use dplyr, broom and you need to pivot it long so that we can apply the same formula value ~ year for every group:

library(dplyr)
library(broom)
library(tidyr)

df %>% pivot_longer(-c(year,treatment)) %>% 
group_by(treatment,name) %>% 
do(tidy(t.test(value ~ year,data=.)))

# A tibble: 6 x 12
# Groups:   treatment, name [6]
  treatment name  estimate estimate1 estimate2 statistic p.value parameter
  <fct>     <chr>    <dbl>     <dbl>     <dbl>     <dbl>   <dbl>     <dbl>
1 A         var1    0.159    0.0225    -0.136      0.518  0.609       23.1
2 A         var2    0.126    0.216      0.0899     0.386  0.703       23.8
3 B         var1   -0.364   -0.142      0.222     -1.03   0.317       16.7
4 B         var2    0.209    0.00951   -0.199      0.444  0.663       15.2
5 C         var1   -0.0387  -0.176     -0.137     -0.103  0.919       29.7
6 C         var2    0.651    0.488     -0.162      1.85   0.0744      27.6
# … with 4 more variables: conf.low <dbl>, conf.high <dbl>, method <chr>,
#   alternative <chr>

How to iterate a t-test in R with data that has more than two arguments or categorical variables?

1 Answers1