41

I've found this issue with t-tests and chi-squared in R but I assume this issue applies generally to other tests. If I do:

a <- 1:10
b <- 100:110
t.test(a,b) 

I get: t = -64.6472, df = 18.998, p-value < 2.2e-16. I know from the comments that 2.2e-16 is the value of .Machine$double.eps - the smallest floating point number such that 1 + x != 1, but of course R can represent numbers much smaller than that. I know also from the R FAQ that R has to round floats to 53 binary digits accuracy: R FAQ.

A few questions: (1) am I correct in reading that as 53 binary digits of precision or are values in R < .Machine$double.eps not calculated accurately? (2) Why, when doing such calculations does R not provide a means to display a smaller value for the p-value, even with some loss of precision? (3) Is there a way to display a smaller p-value, even if I lose some precision? For a single test 2 decimal significant figures would be fine, for values I am going to Bonferroni correct I'll need more. When I say "lose some precision" I think < 53 binary digits, but (4) am I completely mistaken and any p-value < .Machine$double.eps is wildly inaccurate? (5) Is R just being honest and other stats packages are not?

In my field very small p-values are the norm, some examples: http://www.ncbi.nlm.nih.gov/pubmed/20154341, http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen.1002215 and this is why I want to represent such small p-values.

Thanks for your help, sorry for such a tortuous question.

JayPeerachai
  • 3,499
  • 3
  • 14
  • 29
arandomlypickedname
  • 1,349
  • 1
  • 11
  • 12
  • 1
    OK I'll unaccept the answer. I'd like to know though why my question has been downvoted so much, as i don't see a clear answer for it below (or anywhere on the net I've searched). I'll rewrite and hopefully get some clarification – arandomlypickedname Aug 13 '11 at 23:56
  • 2
    see my answer below. I think your question is sensible and eWizardII is right ... – Ben Bolker Aug 14 '11 at 14:32
  • 1
    I have, following discussions with Ben on chat realised I misunderstood your Q - as did several people by the looks of things. I am inclined now to consider the originally accepted Answer as the correct one - or possibly Ben's as he explains the reasoning. Apologies for the noise. – Gavin Simpson Aug 15 '11 at 17:01
  • 1
    +1 This is a great question in that learning its answer is enlightening to a lot of us as to some statistical applications and our assumptions about R. :) I learned several things. I hope others will fix any downvotes they've reconsidered. – Iterator Aug 15 '11 at 20:32

7 Answers7

23

I'm puzzled by several things in the exchange of answers and comments here.

First of all, when I try the OP's original example I don't get a p value as small as the ones that are being debated here (several different 2.13.x versions and R-devel):

a <- 1:10
b <- 10:20
t.test(a,b)
## data:  a and b 
## t = -6.862, df = 18.998, p-value = 1.513e-06

Second, when I make the difference between groups much bigger, I do in fact get the results suggested by @eWizardII:

a <- 1:10
b <- 110:120
(t1 <- t.test(a,b))
# data:  a and b 
# t = -79.0935, df = 18.998, p-value < 2.2e-16
#
> t1$p.value
[1] 2.138461e-25

The behavior of the printed output in t.test is driven by its call to stats:::print.htest (which is also called by other statistical testing functions such as chisq.test, as noted by the OP), which in turn calls format.pval, which presents p values less than its value of eps (which is .Machine$double.eps by default) as < eps. I'm surprised to find myself disagreeing with such generally astute commenters ...

Finally, although it seems silly to worry about the precise value of a very small p value, the OP is correct that these values are often used as indices of strength of evidence in the bioinformatics literature -- for example, one might test 100,000 candidate genes and look at the distribution of resulting p values (search for "volcano plot" for one example of this sort of procedure).

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • Argh, sorry, in my intital mucking around I managed to conflate the results for various t-tests. You're right, the p-value for a <- 1:10, b <- 10:20 is as you state. I'll fix up my question. – arandomlypickedname Aug 20 '11 at 05:59
13

Two questions:

1) What possible difference in statistical implication would there be between p-values of 1e-16 and 1e-32? If you truly can justify it then using the logged values is the way to go.

2) Why do you use Wikipedia when your interest in in the numerical accuracy of R?

The R-FAQ says "Other [meaning non-integer] numbers have to be rounded to (typically) 53 binary digits accuracy." 16 digits is about the limit. This is how to get the limits of accuracy when at the console:

> .Machine$double.eps
[1] 2.220446e-16

That number is effectively zero when interpreted on a range of [0,1]

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • 4
    1) when doing a great many bonferroni corrections it is helpful to know the true figure and not just "it's small". – arandomlypickedname Aug 07 '11 at 05:31
  • 2
    Ah, a spending significance strategy. I doubt that Bonferroni is accurate with the number of multiple corrections of size 23-16 that would fit into 0.05. That would be something like 4.166667e+13 comparisons. – IRTFM Aug 07 '11 at 13:23
  • I wrote `of size 2.220446e-16` but it was mangled by the auto correct aspect of hte SO comment interface. – IRTFM Aug 07 '11 at 13:50
  • 3
    You are assuming there that a p-value of 0.05 is sufficient for my application, and it is not and is not for many applications. For example in the abstract of this paper is given a bonferroni corrected p-value of ~1e-10: http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen.1002215 – arandomlypickedname Aug 13 '11 at 04:30
  • That is a very strange approach being suggested in that article. The usual approach using Bonferroni is to adjust the significance level rather than the p-value. That article is proposing dividing the p-value by the number of multiple comparisons, As I see it, very strange. It is inflating significance rather than properly deflating it. – IRTFM Aug 13 '11 at 06:22
  • 4
    The last comment is not strictly true. `.Machine$double.xmin` (2.22e-308 on my machine) is the smallest number that can be distinguished from zero; `.Machine$double.eps` is the smallest number such that `1+x` can be distinguished from 1 ... – Ben Bolker Aug 15 '11 at 17:35
11

The Wikipedia page you linked to was for the Decimal64 type which R does not use – it uses standard-issue doubles.

First, some definitions from the .Machine help page.

double.eps: the smallest positive floating-point number ‘x’ such that ‘1 + x != 1’. ... Normally ‘2.220446e-16’.

double.xmin: the smallest non-zero normalized floating-point number ... Normally ‘2.225074e-308’.

So you can represent numbers smaller than 2.2e-16, but their accuracy is dimished, and it causes problems with calculations. Try some examples with numbers close to the smallest representable value.

2e-350 - 1e-350
sqrt(1e-350)

You mentioned in a comment that you wanted to do bonferroni corrections. Rather than rolling your own code for this, I suggest that you use p.adjust(your_p_value, method = "bonferroni") instead. pairwise.t.test uses this.

Richie Cotton
  • 118,240
  • 47
  • 247
  • 360
9

Try something like this t.test(a,b)$p.value see if that gives you the accuracy you need. I believe it has more to do with the printing of the result than it does the actual stored computer value which should have the necessary precision.

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
eWizardII
  • 1,916
  • 4
  • 32
  • 55
  • 8
    This is wrong. As DWin mentions in his answer, `2.2e-16` is the smallest number greater than zero that can be stored (on most systems) due to the way floating points numbers are handled. It isn't being stored more accurately. – Richie Cotton Aug 07 '11 at 12:24
  • 6
    Sorry, sloppy language in my comment. `2.2e-16` is actually the smallest `x` s.t. `1 + x != x`, which loosely translates as "how accurately can you distinguish numbers". You can represent smaller numbers (see `.Machine$double.xmin`) but with less accuracy. – Richie Cotton Aug 07 '11 at 12:39
  • 2
    A further reason that this is quite possibly theoretically wrong is that proposed use is with a t.test in what are most probably counts in a genetic test (the only possible domain where `4e13` tests would make sense) would be applying a Normal theory test to the wrong type of data. – IRTFM Aug 07 '11 at 13:47
  • Unfortunately, I can delete the answer as it's been accepted but if the OP will fix that I will gladly delete the incorrect solution, sorry for the confusion. – eWizardII Aug 08 '11 at 17:58
  • 3
    See my answer: I think you're right, although my answer does provide a little bit of additional information. I gave you a +1 to counterbalance the negative ratings. We can see if the other commenters still disagree ... – Ben Bolker Aug 14 '11 at 22:39
  • 4
    I completely misunderstood the OPs question and in light of Ben's Answer and discussion on chat, I should apologise and undo my downvote (hence the edit so I could unlock my vote). – Gavin Simpson Aug 15 '11 at 16:59
5

Some R packages solve this issue. The best way is through package pspearman.

source("http://www.bioconductor.org/biocLite.R")
biocLite("pspearman")
library("pspearman")
a=c(1:110,110)
b=1:111
out <- spearman.test(a, b, alternative = "greater", approximation="t-distribution")
out$p.value

[1] 3.819961e-294

user1277593
  • 61
  • 1
  • 3
2

Recently had same problem. Fellow statistician recommends:

A <- cor.test(…)
p <- 2* pt(A$statistic,  df = A$parameter, lower.tail=FALSE)
Vince
  • 3,325
  • 2
  • 23
  • 41
  • this is not bad but you should use `abs(A$statistic)` in case your statistic happens to be negative. `A$p.value` would also work. – Ben Bolker Dec 27 '21 at 14:50
0

This is a popular question but surprisingly no answer mentioned using logarithm representation as a solution.

In some research areas, notably bioinformatics (especially genomics, but increasingly more in other -omic fields) exact log10(p-value) are used to compare evidence against the null. Logs of p-values can be obtained in R for the common tests by passing log.p=TRUE to the appropriate quantile distribution function.

t-test

a = 1:10
b = 110:120

log10_t_test = function(...) {
    model = t.test(...)
    # note: you need to modify below if passing `alternative` arg
    log_e_p = log(2) + unname(pt(abs(model$statistic), df=model$parameter, lower.tail=FALSE, log.p=TRUE))
    model$log10_pvalue = log_e_p / log(10)
    model
}

model = log10_t_test(a, b)
model$log10_pvalue  # gives  -24.6699

which you can evaluate against naive computation of log10(p):

t(sapply(seq(2, 7), function(order_of_magnitude) {
    n = 10 ** order_of_magnitude
    a = rnorm(n, mean=0)
    b = rnorm(n, mean=0.05)
    model = log10_t_test(a, b)
    c(
        proper_log10p=model$log10_pvalue,
        naive_log10p=log10(model$p.value)
    )
}))
proper_log10p naive_log10p
-0.2239816 -0.2239816
-1.4719561 -1.4719561
-0.7009232 -0.7009232
-30.7052283 -30.7052283
-250.8593000 -250.8593000
-2737.2759952 -Inf

Correlation

log10_cor_test = function(x, y, ..., method='pearson') {
    model = cor.test(x, y, ..., method=method)
    if (method == 'spearman') {
        r = model$estimate
        n = length(x)  # note: this assumes no missing values
        statistic = r * sqrt((n - 2) / (1 - r**2))
        df = n - 2
    } else {
        statistic = model$statistic
        df = model$parameter
    }
    log_e_p = log(2) + unname(pt(abs(statistic), df=df, lower.tail=FALSE, log.p=TRUE))
    model$log10_pvalue = log_e_p / log(10)
    model
}

Pearson

Conveniently this also uses a t-statistic and the statistic along with degrees of freedom parameter can be extracted directly from cor.test result.

Comparison:

t(sapply(seq(2, 7), function(order_of_magnitude) {
    n = 10 ** order_of_magnitude
    a = seq(1, n)
    b = a + rnorm(n) * 10**7 # add strong noise
    model = log10_cor_test(a, b)
    c(
        proper_log10p=model$log10_pvalue,
        naive_log10p=log10(model$p.value)
    )
}))
proper_log10p naive_log10p
-2.435782e+00 -2.4357824
-6.848772e-01 -0.6848772
-1.073320e-01 -0.1073320
-7.630908e-01 -0.7630908
-1.815829e+02 -181.5829491
-1.735492e+05 -Inf

Spearman

This one requires more manual work as we need to compute degrees of freedom (n-2) and the statistic manually.

If you are happy with t-distribution approximation, you can use compute the test statistic using: r * sqrt((n - 2) / (1 - r**2)) formula and re-use the same pt function.

Comparison:

t(sapply(seq(2, 7), function(order_of_magnitude) {
    n = 10 ** order_of_magnitude
    a = seq(1, n)
    b = a + rnorm(n) * 10**7 # add strong noise
    model = log10_cor_test(a, b, method='spearman')
    c(
        proper_log10p=model$log10_pvalue,
        naive_log10p=log10(model$p.value)
    )
}))
proper_log10p naive_log10p
-6.525561e-02 -0.06535951
-3.388555e-01 -0.33891807
-7.684660e-03 -0.00768466
-1.337620e-02 -0.01337620
-1.874304e+02 -187.43039010
-1.682590e+05 -Inf
krassowski
  • 13,598
  • 4
  • 60
  • 92