5

I have stumbled across a bit of an annoying problem. I am trying to perform multiple independent sample t-tests at once, grouped by a value.

To put it into an example: In 5 cities we have measured job satisfaction between males and females. For every subject we know their city, their gender, and their satisfaction score. We can do a t-test over the whole sample like this:

t.test(Score ~ Gender, paired = FALSE)

However, I want to perform a t-test for each city, to see if there is a difference in the average job satisfaction per gender per city. How do I do this?

koolmees
  • 2,725
  • 9
  • 23
  • I answered a very similar question here. See: https://stackoverflow.com/a/60553532/12176280 – George Savva Mar 10 '20 at 11:27
  • But you might also consider whether you want a linear model to test the interaction between city and gender, that is testing whether the effect of gender differs by city. That is a slightly different question. – George Savva Mar 10 '20 at 11:29
  • 1
    As a side note: Be aware of the ['multiple testing problem'](https://xkcd.com/882/) – dario Mar 10 '20 at 11:51

2 Answers2

4

You can use lapply with split to do a group by t.test,

lapply(split(mtcars, factor(mtcars$cyl)), function(x)t.test(data=x, mpg ~ am, paired=FALSE))

Here I have used mtcars data, and performed a independent t.test by using cyl as group to perform a t.test on mpg(continuous data) and am (categorical data). Let me know if this is not you are expecting.

PKumar
  • 10,971
  • 6
  • 37
  • 52
2

You can use the purrr package to return the test data as a tibble:

library(tidyverse)
mtcars %>%
  select(mpg, am, cyl) %>%
  nest(data = c(mpg, am)) %>%
  mutate(data = map(data, ~ {
    out <- t.test(.x$mpg ~ .x$am)
    tibble(t_value = out$statistic, p_value = out$p.value)
  })) %>%
  unnest(cols = data)

# A tibble: 3 × 3
    cyl t_value p_value
  <dbl>   <dbl>   <dbl>
1     6  -1.56   0.187 
2     4  -2.89   0.0180
3     8  -0.391  0.704 
ha-pu
  • 581
  • 7
  • 19