Group by and run multiple t tests in R

Question

I have the following dataset (dput here):

# A tibble: 3,713 x 17
      ID Age   Group      RHR   HRV Sleep.Onset Wake.Onset Hours.in.Bed Hours.of.Sleep Sleep.Disturbances Latency.min Cycles REM.Sleep.hours Deep.Sleep.hours
   <int> <chr> <chr>    <int> <int>       <dbl>      <dbl>        <dbl>          <dbl>              <int>       <dbl>  <int>           <dbl>            <dbl>
 1  5027 Young Increase    58    73      0.180       0.458         6.66           5.33                  9        8.98      6            1.4              0.32
 2  5027 Young Increase    83    27      0.162       0.542         9.1            6.84                 15        3.48      9            1.19             1.54
 3  5027 Young Increase    57    85      0.113       0.318         4.92           4.43                  5        1.98      4            1.32             0.44
 4  5027 Young Increase    60    70      0.0975      0.319         5.32           3.75                  3       26.5       4            1.02             0.14
 5  5027 Young Increase    63    72      0.105       0.329         5.38           4.74                  5        2.48      5            1.32             0.07
 6  5027 Young Increase    62    61      0.983       0.472        11.8            9.44                  9        4.48      8            2.07             0.84
 7  5027 Young Increase    66    68      0.142       0.426         6.83           5.48                 15        2.98      6            1.48             0.35
 8  5027 Young Increase    81    28      0.0908      0.177         2.06           1.93                  2        2.48      1            0.22             0.22
 9  5027 Young Increase    69    57      0.158       0.443         6.85           6.58                 13        0.48      6            2.43             0   
10  5027 Young Increase    63    60      0.0859      0.318         5.58           5.47                  4        0.48      5            1.34             0.13
# ... with 3,703 more rows, and 3 more variables: Light.Sleep.hours <dbl>, Awake.hours <dbl>, Session <chr>

I am trying to calculate a t-test across every variable, grouped by Age and Group between Session (pre or post).

df %>%
    select(-ID) %>%
    group_by(Age, Group) %>%
    summarize_at(
        vars(-group_cols(), -Session),
        list(p.value = ~ t.test(. ~ Session)$p.value))

I am successful with p values:

# A tibble: 4 x 15
# Groups:   Age [2]
  Age   Group    RHR_p.value HRV_p.value Sleep.Onset_p.value Wake.Onset_p.value Hours.in.Bed_p.value Hours.of.Sleep_p~ Sleep.Disturban~ Latency.min_p.v~
  <chr> <chr>          <dbl>       <dbl>               <dbl>              <dbl>                <dbl>             <dbl>            <dbl>            <dbl>
1 Old   Decrease     0.0594        0.865              0.495              0.885               0.316             0.307              0.148          0.00237
2 Old   Increase     0.00920       0.634              0.0979             0.0514              0.00774           0.00762            0.247          0.933  
3 Young Decrease     0.0975        0.259              0.779              0.760               0.959             0.975              0.256          0.181  
4 Young Increase     0.115         0.604              0.846              0.164               0.140             0.242              0.692          0.412  
# ... with 5 more variables: Cycles_p.value <dbl>, REM.Sleep.hours_p.value <dbl>, Deep.Sleep.hours_p.value <dbl>, Light.Sleep.hours_p.value <dbl>,
#   Awake.hours_p.value <dbl>

However, I am struggling to calculate the other t-statistics (mean, sd, t, df, 95%CI) between these pre-post and also correct p-values groups. I am struggling to do this so any help is appreciated.

I think I may need to convert data long and use something like this?

df %>%
    group_by(Age, Group) %>%
    t_test(mean ~ ., by = "Session") %>%
    adjust_pvalue(method = "bonferroni") %>%
    add_significance()

Having trouble accessing your dataset, but just curious...is there a reason you want to run multiple t tests? As far as it is my understanding, this increases the chance of incorrectly finding significance due to the combined alpha levels of each test. Your normal alpha level is 5%. By running two t-tests on the same data you will have increased your chance of "making a mistake" to 10%. 3 tests would be around 15%. This is an issue. Something more omnibus to use first (testing for several parameters), such as ANOVA, ANCOVA, etc., would be more appropriate for that purpose. — Shawn Hemelstrand, Jan 31 '22 at 00:38
@ShawnHemelstrand im not sure why it keeps locking permission, but we are adjusting the t-test p values. it is for group comparisons for a publication — CanyonView, Feb 01 '22 at 00:01

GuedesBF · Accepted Answer · 2022-01-29T05:33:34.273

Dndata frames can only have certain object classes as column types. A htest is not one of those. However, we can store lists as list-columns. If we adapt the current code to output lists htests as results, we can later extract elements of the tests separately.

library(dplyr)

output <- df %>%
        select(-ID) %>%
        group_by(Age, Group) %>%
        summarize_at(
            vars(-group_cols(), -Session),
            list(t.test = ~ list(t.test(. ~ Session))))

output

# A tibble: 4 × 15
# Groups:   Age [2]
  Age   Group    RHR_t.test HRV_t.test Sleep.Onset_t.test Wake.Onset_t.test Hours.in.Bed_t.test Hours.of.Sleep_t.test Sleep.Disturbance… Latency.min_t.t… Cycles_t.test REM.Sleep.hours…
  <chr> <chr>    <list>     <list>     <list>             <list>            <list>              <list>                <list>             <list>           <list>        <list>          
1 Old   Decrease <htest>    <htest>    <htest>            <htest>           <htest>             <htest>               <htest>            <htest>          <htest>       <htest>         
2 Old   Increase <htest>    <htest>    <htest>            <htest>           <htest>             <htest>               <htest>            <htest>          <htest>       <htest>         
3 Young Decrease <htest>    <htest>    <htest>            <htest>           <htest>             <htest>               <htest>            <htest>          <htest>       <htest>         
4 Young Increase <htest>    <htest>    <htest>            <htest>           <htest>             <htest>               <htest>            <htest>          <htest>       <htest>

With this output data.frame, we can extract individual tests and values from them as desired:

output$RHR_t.test

[[1]]

    Welch Two Sample t-test

data:  . by Session
t = -1.8965, df = 188.22, p-value = 0.05942
alternative hypothesis: true difference in means between group Post and group Pre is not equal to 0
95 percent confidence interval:
 -3.09118590  0.06082897
sample estimates:
mean in group Post  mean in group Pre 
          62.28902           63.80420 


[[2]]

    Welch Two Sample t-test

data:  . by Session
t = -2.6271, df = 226.21, p-value = 0.009199
alternative hypothesis: true difference in means between group Post and group Pre is not equal to 0
95 percent confidence interval:
 -3.3949577 -0.4848655
sample estimates:
mean in group Post  mean in group Pre 
          57.95946           59.89937 


[[3]]

    Welch Two Sample t-test

data:  . by Session
t = 1.6633, df = 251.75, p-value = 0.0975
alternative hypothesis: true difference in means between group Post and group Pre is not equal to 0
95 percent confidence interval:
 -0.2074028  2.4611194
sample estimates:
mean in group Post  mean in group Pre 
          60.58255           59.45570 


[[4]]

    Welch Two Sample t-test

data:  . by Session
t = 1.5849, df = 208.4, p-value = 0.1145
alternative hypothesis: true difference in means between group Post and group Pre is not equal to 0
95 percent confidence interval:
 -0.244287  2.247775
sample estimates:
mean in group Post  mean in group Pre 
          60.23462           59.23288

output$RHR_t.test %>%
    map_dbl('p.value')

[1] 0.059424354 0.009199459 0.097497620 0.114502332

We can also convert these lists to user-friendly tibbles with broom::tidy

output %>%
    mutate(across(ends_with('t.test'), map, broom::tidy))

# A tibble: 4 × 15
# Groups:   Age [2]
  Age   Group    RHR_t.test        HRV_t.test   Sleep.Onset_t.te… Wake.Onset_t.test Hours.in.Bed_t.t… Hours.of.Sleep_… Sleep.Disturbanc… Latency.min_t.t… Cycles_t.test REM.Sleep.hours…
  <chr> <chr>    <list>            <list>       <list>            <list>            <list>            <list>           <list>            <list>           <list>        <list>          
1 Old   Decrease <tibble [1 × 10]> <tibble [1 … <tibble [1 × 10]> <tibble [1 × 10]> <tibble [1 × 10]> <tibble [1 × 10… <tibble [1 × 10]> <tibble [1 × 10… <tibble [1 ×… <tibble [1 × 10…
2 Old   Increase <tibble [1 × 10]> <tibble [1 … <tibble [1 × 10]> <tibble [1 × 10]> <tibble [1 × 10]> <tibble [1 × 10… <tibble [1 × 10]> <tibble [1 × 10… <tibble [1 ×… <tibble [1 × 10…
3 Young Decrease <tibble [1 × 10]> <tibble [1 … <tibble [1 × 10]> <tibble [1 × 10]> <tibble [1 × 10]> <tibble [1 × 10… <tibble [1 × 10]> <tibble [1 × 10… <tibble [1 ×… <tibble [1 × 10…
4 Young Increase <tibble [1 × 10]> <tibble [1 … <tibble [1 × 10]> <tibble [1 × 10]> <tibble [1 × 10]> <tibble [1 × 10… <tibble [1 × 10]> <tibble [1 × 10… <tibble [1 ×… <tibble [1 × 10…
# … with 3 more variables: Deep.Sleep.hours_t.test <list>, Light.Sleep.hours_t.test <list>, Awake.hours_t.test <list>

To have all tests "statistics", we can do it like this:

tidy_output %>%
    mutate(across(ends_with('t.test'), sapply, pull, 'statistic'))

# A tibble: 4 × 15
# Groups:   Age [2]
  Age   Group    RHR_t.test HRV_t.test Sleep.Onset_t.test Wake.Onset_t.test Hours.in.Bed_t.test Hours.of.Sleep_t.test Sleep.Disturbance… Latency.min_t.t… Cycles_t.test REM.Sleep.hours…
  <chr> <chr>         <dbl>      <dbl>              <dbl>             <dbl>               <dbl>                 <dbl>              <dbl>            <dbl>         <dbl>            <dbl>
1 Old   Decrease      -1.90      0.171              0.684            -0.145             -1.01                 -1.02               -1.45            3.05          -0.928           -0.906
2 Old   Increase      -2.63      0.477             -1.66             -1.96              -2.69                 -2.69               -1.16            0.0848        -1.76            -1.87 
3 Young Decrease       1.66      1.13               0.281            -0.305              0.0509               -0.0320              1.14           -1.34          -0.675            0.672
4 Young Increase       1.58      0.519              0.195            -1.40              -1.48                 -1.17                0.397          -0.821         -1.73             0.886
# … with 3 more variables: Deep.Sleep.hours_t.test <dbl>, Light.Sleep.hours_t.test <dbl>, Awake.hours_t.test <dbl>

Group by and run multiple t tests in R

1 Answers1