5

In genetics very small p-values are common (for example 10^-400), and I am looking for a way to get very small p-values (two-tailed) when the z-score is large in R, for example:

z=40
pvalue = 2*pnorm(abs(z), lower.tail = F)

This gives me a zero instead of a very small value which is very significant.

user971102
  • 3,005
  • 4
  • 30
  • 37
  • 1
    probably a dupe: you need to compute the p-value on the log scale (`log(2) + pnorm(abs(z), lower.tail=FALSE, log.p=TRUE))`) – Ben Bolker Sep 26 '17 at 00:28
  • also maybe of interest https://stats.stackexchange.com/questions/78839/how-should-tiny-p-values-be-reported-and-why-does-r-put-a-minimum-on-2-22e-1 – user20650 Sep 26 '17 at 00:30
  • https://stackoverflow.com/questions/26467736/r-small-pvalues/26471088#26471088 is very closely related. – Ben Bolker Sep 26 '17 at 00:30
  • Ah! That's clever. And that would be the exponent of the p-value right? – user971102 Sep 26 '17 at 00:31
  • @user20650: yes, but in this case the p-values are not just smaller than R's default reporting limit (2e-16) but actually smaller than can be represented with 64-bit floating point values ... – Ben Bolker Sep 26 '17 at 00:32
  • Thank you Ben! That seems like a huge lack for R... Thanks though – user971102 Sep 26 '17 at 00:38

1 Answers1

12

The inability to handle p-values less than about 10^(-308) (.Machine$double.xmin) is not really R's fault, but is rather a generic limitation of any computational system that uses double precision (64-bit) floats to store numeric information.

It's not hard to solve the problem by computing on the log scale, but you can't store the result as a numeric value in R; instead, you need to store (or print) the result as a mantissa plus exponent.

pvalue.extreme <- function(z) {
   log.pvalue <- log(2) + pnorm(abs(z), lower.tail = FALSE, log.p = TRUE)
   log10.pvalue <- log.pvalue/log(10) ## from natural log to log10
   mantissa <- 10^(log10.pvalue %% 1)
   exponent <- log10.pvalue %/% 1
   ## or return(c(mantissa,exponent))
   return(sprintf("p value is %1.2f times 10^(%d)",mantissa,exponent))
}

Test with a not-too-extreme case:

pvalue.extreme(5)
## [1] "p value is 5.73 times 10^(-7)"
2*pnorm(5,lower.tail=FALSE)
## [1] 5.733031e-07

More extreme:

pvalue.extreme(40)
## [1] "p value is 7.31 times 10^(-350)"

There are a variety of packages that handle extremely large/small numbers with extended precision in R (Brobdingnag, Rmpfr, ...) For example,

2*Rmpfr::pnorm(mpfr(40, precBits=100), lower.tail=FALSE, log.p = FALSE)
## 1 'mpfr' number of precision  100   bits 
## [1] 7.3117870818300594074979715966414e-350

However, you will pay a big cost in computational efficiency and convenience for working with an arbitrary-precision system.

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • I see! Thank you for the thorough and clear explanation which restored my faith in R. So R is OK up to 10^(-323), which may be enough anyway for top genetic associations. Thank you. – user971102 Sep 26 '17 at 05:30