Multi-group differential gene expression for time-series treatment data

Question

This is an example dataset:

df = data.frame(genes = c("A", "B", "C", "D", "E"),
                KO_0min_Rep1 = c(0, 1, 2, 6, 6),
                KO_0min_Rep2 = c(0, 3, 2, 3, 6),
                KO_60min_Rep1 = c(0, 0.3, 2, 9.1, 6),
                KO_60min_Rep2 = c(0, 1.3, 2, 6.4, 6),
                KO_120min_Rep1 = c(0, 1, 1, 6, 5),
                KO_120min_Rep2 = c(0, 1, 2.1, 6.8, 5.2),
                WT_0min_Rep1 = c(0, 1, 2, 6, 6),
                WT_0min_Rep2 = c(0, 1, 1.6, 3, 6),
                WT_60min_Rep1 = c(0, 1, 2, 9, 6),
                WT_60min_Rep2 = c(0, 0.3, 2, 6, 2),
                WT_120min_Rep1 = c(0, 1.9, 2, 2, 6),
                WT_120min_Rep2 = c(0, 1.2, 2, 6, 2)  )

The data-frame has several columns, of which the "genes" column has >9000 genes and all other columns are various conditions and treatments. The experimental design is as follows: I have two kinds of cell types: wild type (WT) and knockout (KO). To both of these cell types I treated cells with a DNA damaging agent for 0 minutes, 60 minutes, and 120 minutes. I also have two replicates for these conditions and treatments combinations.

I want to know the genes that are significantly altered after the treatments, but more importantly between the WT and KO conditions over the treatment time points.

This is what I have tried:

lme4_dat = df %>%
  tidyr::gather(conditions, value, -genes) %>%
  dplyr::mutate( group = case_when(grepl("KO", conditions) ~ "KO",
                                   grepl("WT", conditions) ~ "WT")) %>%
  dplyr::mutate( time = case_when(grepl("UT", conditions) ~ "0",
                                  grepl("60", conditions) ~ "60",
                                  grepl("120", conditions) ~ "120" )) %>%
  dplyr::mutate( replicate = case_when(grepl("_Rep1", conditions) ~ "Rep1",
                                       grepl("_Rep2", conditions) ~ "Rep2"))

Then I try to fit a linear mixed-effects model

lme4_model = lme4::lmer(value ~ conditions * time + (1|genes) + (1|replicate), data = lme4_dat)

It's obviously not working. I am not sure if I am doing it correctly? Or is there a better alternative?

Any guidance will be much appreciated. Thank you.

score 2 · Accepted Answer · answered May 29 '23 at 23:42

I would suggest repeated measures ANOVA with one within-groups and one between-groups factor.

lme4_dat$time[is.na(lme4_dat$time)] = 0
lme4_dat$time = as.factor(lme4_dat$time)
fit <- aov(value ~ time*group + Error(genes/time), lme4_dat)
summary(fit)
library(HH)
interaction2wt(value ~ time*group, lme4_dat)
interaction.plot(lme4_dat$time, lme4_dat$group, lme4_dat$value)

Output

Error: genes
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals  4  310.4   77.59               

Error: genes:time
          Df Sum Sq Mean Sq F value Pr(>F)
time       2  2.617   1.309   0.424  0.668
Residuals  8 24.678   3.085               

Error: Within
           Df Sum Sq Mean Sq F value Pr(>F)
group       1   2.48  2.4807   1.892  0.176
time:group  2   0.21  0.1047   0.080  0.923
Residuals  42  55.07  1.3112

The group (KO/WT) is borderline significant. The time is less so. The interaction isn't significant with a p.value of .923, which you can also tell by looking at parallel lines in the interaction plot. At time 60 the response increased before dropping at 120.

Thank you so much. This worked as I wanted. – ip2018 May 30 '23 at 00:08 — ip2018, May 30 '23 at 00:08

Multi-group differential gene expression for time-series treatment data

1 Answers1