Issues with limma for analysis of microarray gene expression data - possibly related to design matrix

Question

I am fairly new to R, and have recently started using it to analyse some microarray data. The overall aim of the analysis is to take DC2 and compare the WT vs KO groups in this population. But I have come across some problems with the limma processing. After processing the data using the oligo package, I have then tried to create a design matrix for analysis using limma. This is my workflow for the ExpressionSet of DC2:

pData(DC2)

         index filename genotype cell_type
1 KO DC2     2 HP10.CEL       KO       DC2
2 KO DC2     3 HP11.CEL       KO       DC2
3 KO DC2     4 HP12.CEL       KO       DC2
1 WT DC2    10  HP7.CEL       WT       DC2
2 WT DC2    11  HP8.CEL       WT       DC2
3 WT DC2    12  HP9.CEL       WT       DC2    

design <- model.matrix(~DC2$genotype)
design

  (Intercept) DC2$genotypeWT
1           1              0
2           1              0
3           1              0
4           1              1
5           1              1
6           1              1


fit <- lmFit(DC2, design)
fit <- eBayes(fit)
toptable(fit)

This feeds out a list of genes as follows:

            logFC        t      P.Value    adj.P.Val        B
17551163 14.09722 208.2627 2.990326e-13 2.700912e-10 17.14467
17511316 13.91167 205.0811 3.292503e-13 2.700912e-10 17.12716
17551167 13.92093 204.5801 3.343243e-13 2.700912e-10 17.12434
17375373 13.76320 202.1271 3.605170e-13 2.700912e-10 17.11025
17550685 13.74022 201.5428 3.671032e-13 2.700912e-10 17.10682

However, when I go to check this manually (just taking the first feature) using this code:

toptable(fit, n=1)
genename <- rownames(toptable(fit, n=1))
typeMean <- tapply(exprs(DC2)[genename,], DC2$genotype, mean)
typeMean["KO"] - typeMean["WT"]

The output for the same feature "17551163" is different

     KO 
0.04538317

I have tried to search around for an answer, but with no luck. I'm assuming that this may be something to do with the matrix design? Any help would be greatly appreciated.

Thanks

@PoGibas Thanks for the reply. The expression values are: "1 KO DC2" 14.0912151923856 "2 KO DC2" 14.0912151923856 "3 KO DC2" 14.1092351251855 "1 WT DC2" 14.0592734145348 "2 WT DC2" 14.0327197808068 "3 WT DC2" 14.0635228171713 — Jack, Oct 27 '17 at 10:24
Try putting the `coef` parameter to the `topTable`, to declare in which contrast you are interested: `topTable(fit, coef = 2)`. Does this change the output? — storaged, Oct 27 '17 at 12:34
@storaged Ah, great. That seems to have worked -- thanks a lot. Just conceptually, does that `coef = 2` bit specify the comparison between the 0s and 1s in the `DC2$genotypeWT` column? So when I set `coef = 1`, that compares the 1s in the `(Intercept)` to the 0s and 1s in the `DC2$genotypeWT` column? — Jack, Oct 27 '17 at 13:38
@Jack as far as I know the `limma` package it is the way you described it. `coef = 2` explicitly says that we want the comparison within `DC2$genotypeWT`. Glad it worked for you! — storaged, Oct 27 '17 at 13:45

storaged · Accepted Answer · 2017-10-27T14:09:35.483

An answer, for those who skip reading the discussion in comments below the question.

After performing the estimation with lmFit and eBayes we can question the top differentiating genes between all the contrasts that we provided in the model.matrix step.

Here, the author created the design as follows: design <- model.matrix(~DC2$genotype). Keeping in mind that the (Intercept) is the first coefficient if we want need to explicitly say that we want the contrast related to the DC2$genotype, so the call should be:

toptable(fit, coef = 2)

Naturally, if the design contains more contrasts they are assigned consecutive natural numbers.

REMARK

If we want to remove the intercept from the design design <- model.matrix(~ -1 + DC2$genotype); the first coefficient is now DC2$genotype.

Issues with limma for analysis of microarray gene expression data - possibly related to design matrix

1 Answers1