1

A Fisher exact test is often used for over representation analysis of gene lists in a pathway. Consider the following example of a contingency table:

              in pathway
                 Y   N
in gene list  Y 90  110  |  200 
              N 10  790  |  800
              ------------------
               100  900  | 1000

There are essentially two ways to do a Fisher test based over representation analysis in R. The first is to use fisher.test (which takes the contingency matrix as input)

fisher.test(matrix(c(90,10,110,790), nrow = 2), alternative = 'greater')$p.value
[1] 1.486473e-59

The second is to use phyper (Meng's notes give an excellent explanation on how to use phyper, including why the "-1", and what q, m, n, k exactly mean):

phyper(q=90-1, m=100, n=900, k=200, lower.tail = FALSE)
[1] 1.486473e-59

My question: why does this differ from:

1 - phyper(q=90-1, m=100, n=900, k=200, lower.tail = TRUE)
[1] 0
Aditya
  • 1,554
  • 1
  • 13
  • 23

1 Answers1

1

The C-level code for phyper avoids some calculations (which cause floating point numerical errors), and hence is more accurate, when you specifically ask for the tail that you are interested in.

Richie Cotton
  • 118,240
  • 47
  • 247
  • 360