R t.test group by

Question

I have stumbled across a bit of an annoying problem. I am trying to perform multiple independent sample t-tests at once, grouped by a value.

To put it into an example: In 5 cities we have measured job satisfaction between males and females. For every subject we know their city, their gender, and their satisfaction score. We can do a t-test over the whole sample like this:

t.test(Score ~ Gender, paired = FALSE)

However, I want to perform a t-test for each city, to see if there is a difference in the average job satisfaction per gender per city. How do I do this?

I answered a very similar question here. See: https://stackoverflow.com/a/60553532/12176280 — George Savva, Mar 10 '20 at 11:27
But you might also consider whether you want a linear model to test the interaction between city and gender, that is testing whether the effect of gender differs by city. That is a slightly different question. — George Savva, Mar 10 '20 at 11:29
As a side note: Be aware of the ['multiple testing problem'](https://xkcd.com/882/) — dario, Mar 10 '20 at 11:51

score 4 · Accepted Answer · answered Mar 10 '20 at 11:29

You can use lapply with split to do a group by t.test,

lapply(split(mtcars, factor(mtcars$cyl)), function(x)t.test(data=x, mpg ~ am, paired=FALSE))

Here I have used mtcars data, and performed a independent t.test by using cyl as group to perform a t.test on mpg(continuous data) and am (categorical data). Let me know if this is not you are expecting.

score 2 · Answer 2 · answered Jun 05 '23 at 11:29

You can use the purrr package to return the test data as a tibble:

library(tidyverse)
mtcars %>%
  select(mpg, am, cyl) %>%
  nest(data = c(mpg, am)) %>%
  mutate(data = map(data, ~ {
    out <- t.test(.x$mpg ~ .x$am)
    tibble(t_value = out$statistic, p_value = out$p.value)
  })) %>%
  unnest(cols = data)

# A tibble: 3 × 3
    cyl t_value p_value
  <dbl>   <dbl>   <dbl>
1     6  -1.56   0.187 
2     4  -2.89   0.0180
3     8  -0.391  0.704

R t.test group by

2 Answers2