This is an example dataset:
df = data.frame(genes = c("A", "B", "C", "D", "E"),
KO_0min_Rep1 = c(0, 1, 2, 6, 6),
KO_0min_Rep2 = c(0, 3, 2, 3, 6),
KO_60min_Rep1 = c(0, 0.3, 2, 9.1, 6),
KO_60min_Rep2 = c(0, 1.3, 2, 6.4, 6),
KO_120min_Rep1 = c(0, 1, 1, 6, 5),
KO_120min_Rep2 = c(0, 1, 2.1, 6.8, 5.2),
WT_0min_Rep1 = c(0, 1, 2, 6, 6),
WT_0min_Rep2 = c(0, 1, 1.6, 3, 6),
WT_60min_Rep1 = c(0, 1, 2, 9, 6),
WT_60min_Rep2 = c(0, 0.3, 2, 6, 2),
WT_120min_Rep1 = c(0, 1.9, 2, 2, 6),
WT_120min_Rep2 = c(0, 1.2, 2, 6, 2) )
The data-frame has several columns, of which the "genes" column has >9000 genes and all other columns are various conditions and treatments. The experimental design is as follows: I have two kinds of cell types: wild type (WT) and knockout (KO). To both of these cell types I treated cells with a DNA damaging agent for 0 minutes, 60 minutes, and 120 minutes. I also have two replicates for these conditions and treatments combinations.
I want to know the genes that are significantly altered after the treatments, but more importantly between the WT and KO conditions over the treatment time points.
This is what I have tried:
lme4_dat = df %>%
tidyr::gather(conditions, value, -genes) %>%
dplyr::mutate( group = case_when(grepl("KO", conditions) ~ "KO",
grepl("WT", conditions) ~ "WT")) %>%
dplyr::mutate( time = case_when(grepl("UT", conditions) ~ "0",
grepl("60", conditions) ~ "60",
grepl("120", conditions) ~ "120" )) %>%
dplyr::mutate( replicate = case_when(grepl("_Rep1", conditions) ~ "Rep1",
grepl("_Rep2", conditions) ~ "Rep2"))
Then I try to fit a linear mixed-effects model
lme4_model = lme4::lmer(value ~ conditions * time + (1|genes) + (1|replicate), data = lme4_dat)
It's obviously not working. I am not sure if I am doing it correctly? Or is there a better alternative?
Any guidance will be much appreciated. Thank you.