2

I'm looking for a way to identify genes that are significantly stable across conditions. In other words, the opposite of standard DE analysis.

Standard DE splits genes in two categories: significantly changing on one side, everything else, "the rest", on the other. "The rest", however, contains both genes that actually do not change, and genes for which the confidence in the change is not sufficient to call them differential.
What I want is to find those that do not change, or in other words, those for which I can confidently say that there's no change across my conditions.

I know this is possible in DEseq by providing an alternative null-hypothesis, but I have to integrate this as an extra step into someone else's pipeline that already uses limma, and I'd like to stick to it. Ideally I would like to test for both DE and non changing genes in a similar way, something conceptually similar to changing the H0 in DEseq.

At the moment the code to test for DE goes like:

# shaping data
comparison <- eBayes(lmFit(my_data, weights = my.weights^2))
results <- limma::topTable(my_data, sort.by = "t",  
                     coef = 1, number = Inf)

as an example I'd love something like the following, but anything conceptually alike would do.

comparison <- eBayes(lmFit(my_data, weights = my.weights^2), ALTERNATIVE_H0 = my_H0)

I know treat() allows to specify an interval null hypothesis by providing a fold change, citing the manual: "it uses an interval null hypothesis, where the interval is [-lfc,lfc]".
However this still tests for change from a central interval around 0, while the intervals I would like to test against are [-inf,-lfc] + [lfc,inf].

Is there any option I'm missing?

Thanks!

Matteo
  • 265
  • 5
  • 16
  • did you come up with a solution eventually? – Sashko Lykhenko Dec 03 '19 at 16:06
  • Unfortunately not. I chased it for a bit but then the analysis started diverging and I slowly started forgetting about it. I'd still be curious though, if someone has suggestions – Matteo Dec 03 '19 at 20:19

1 Answers1

2

You can try to use the confidence interval of the logFC to select your genes, but I must say this is very dependent on the number of samples you have, and also how strong is the biological variance. Below I show an example how it can be done:

first we use DESeq2 to generate an example dataset, we set betaSD so that we have a small proportion of genes that should show differences between conditions

library(DESeq2)
library(limma)
set.seed(100)
dds = makeExampleDESeqDataSet(n=2000,betaSD=1)
#pull out the data
DF = colData(dds)
# get out the true fold change
FC = mcols(dds)

Now we can run limma-voom on this dataset,

V = voom(counts(dds),model.matrix(~condition,data=DF))
fit = lmFit(V,model.matrix(~condition,data=DF))
fit = eBayes(fit)
# get the results, in this case, we are interested in the 2nd coef
res = topTable(fit,coef=2,n=nrow(V),confint=TRUE)

So there is an option to collect the 95% confidence interval of the fold change in the function topTable. We do that and compare against the true FC:

# fill in the true fold change
res$true_FC = FC[rownames(res),"trueBeta"]

We can look at how the estimated and true differ:

plot(res$logFC,res$true_FC)

enter image description here

So let's say we want to find genes, where we are confident there's a fold change < 1, we can do:

tabResults = function(tab,fc_cutoff){
true_unchange = abs(tab$true_FC)<fc_cutoff
pred_unchange = tab$CI.L>(-fc_cutoff) & res$CI.R <fc_cutoff
list(
X = table(pred_unchange,true_unchange),
expression_distr = aggregate(
tab$AveExpr ~ pred_unchange+true_unchange,data=tab,mean
))
}

tabResults(res,1)$X

             true_unchange
pred_unchange FALSE TRUE
        FALSE   617 1249
        TRUE      7  127

The above results tells us, if we set limit it to genes whose 95% confidence are within +/- 1 FC, we get 134 hits, with 7 being false (with actual fold change > 1).

And the reason we miss out on some true no-changing genes is because they are expressed a bit lower, while most of what we predicted correctly to be unchanging, have high expression:

tabResults(res,1)$expression_distr

  pred_unchange true_unchange tab$AveExpr
1         FALSE         FALSE    7.102364
2          TRUE         FALSE    8.737670
3         FALSE          TRUE    6.867615
4          TRUE          TRUE   10.042866

We can go lower FC, but we also end up with less genes:

tabResults(res,0.7)

             true_unchange
pred_unchange FALSE TRUE
        FALSE   964 1016
        TRUE      1   19

The confidence interval depends a lot on the number of samples you have. So a cutoff of 1 for one dataset would mean something different for another.

So I would say if you have a dataset at hand, you can first run DESeq2 on the dataset, obtain the mean variance relationship and simulate data like I have done, to more or less guess, what fold change cutoff would be ok, how many can you possibly get, and make a decision from there.

StupidWolf
  • 45,075
  • 17
  • 40
  • 72