5

Using R, and package quantreg, I am performing quantile regression analyses to my data.

I can get access to the p-values using the se (standard error) estimator in the summary function, as below, however I only get 5 decimal places, and would like more.

model <- rq(outcome ~ predictor)
summary(model, se="ker")

Call: rq(formula = outcome ~ predictor)

tau: [1] 0.5

Coefficients:
            Value    Std. Error t value  Pr(>|t|)
(Intercept) 78.68182  2.89984   27.13312  0.00000
predictor    0.22727  0.03885    5.84943  0.00000

How might I get access to more decimal places on the p-values?


Update

Ok, so I can get some more decimal places by selecting the sub-object that contains the matrix of numerical results;

> summary(model, se="ker")[[3]]
                 Value Std. Error   t value     Pr(>|t|)
(Intercept) 78.6818182 3.13897835 25.066059 0.000000e+00
predictor    0.2272727 0.04105681  5.535567 4.397638e-08

However the P-value is still rounded to 0 when the value is <1e-12 (the above output is a simplified example model). I can get some more by applying the suggestion from @seancarmody ;

format(summary(model, se="ker")[[3]], digits=22)

But if P < 1e-22 it is still rounded to 0, and if "digits" is set to > 22 I get the following error;

format(summary(model, se="ker")[[3]], digits=23)

Error in prettyNum(.Internal(format(x, trim, digits, nsmall, width, 3L, : invalid 'digits' argument

Is it possible to access even more decimal places?

lmo
  • 37,904
  • 9
  • 56
  • 69
Luke
  • 643
  • 1
  • 10
  • 17

2 Answers2

5

Have a look at str(model). You can see there is an attribute coefficients, which will give you a numeric vector with higher precision than displayed by summary. You can also look at these with

format(model$coefficients, digits=12)

Note that this converts the coefficients to strings.

To examine the p.values more closely, try

summary(model, se="ker")$coefficients[,1]*1e27

although I very much doubt that there is any meaning to be found in the digits of a p.value less than 1e-27!

seancarmody
  • 6,182
  • 2
  • 34
  • 31
  • Thanks for that - although it only works for the coefficients? I've also found that if you specify the sub-object that contains the numerical coefficients/p-values you get the full answer too: "> summary(model, se="ker")[[3]] " – Luke Aug 21 '12 at 10:46
  • I have updated my question to reflect the exploration I have done with your suggestion. I would greatly appreciated an update to your answer. Thanks. – Luke Aug 21 '12 at 11:35
  • That doesn't appear to work (0 * 1e27 = 0) unfortunately. Thanks for your input though. The reason I need such precise values is that I am performing many thousands of tests in parallel, so I therefore need to adjust the P-values accordingly (however, as far as my test is concerned a P-value < 1e-22 is effectively 0 - I would just like to be as precise as possible) – Luke Aug 21 '12 at 12:57
  • I'd say you are pushing the bounds of precision here! – seancarmody Aug 21 '12 at 13:07
  • Ok. I'll see if I get any more methods before accepting your answer (22 decimal places is certainly better than 12, but there may be a way to access the exact value). Cheers – Luke Aug 21 '12 at 13:18
  • Of course, but I would still like to have rank the associations (regardless how strong they are), so whilst it does not have to be 'exact', I would like to compare the outcomes of many models using the same method, rather than report a list of associations with P=0. Thanks again for the input. – Luke Aug 21 '12 at 22:20
2

To get any farther I think you have to dig in and see how the p values are calculated. In particular, summary.rq has the following snippet:

  coef[, 4] <- if (rdf > 0) 
        2 * (1 - pt(abs(coef[, 3]), rdf))
    else NA

This is actually a fairly imprecise calculation of the p-value (which under ordinary circumstances doesn't really matter). You can probably get the maximum amount of precision by retrieving the log of the p-value [for example, you could in principle retrieve p-values less than 10^{-308}, the smallest value that R can represent as a double-precision value], e.g.

ss <- summary(model,se="ker")
log(2)+pt(abs(ss$coefficients[,"t value"]),
     lower.tail=FALSE,log.p=TRUE,df=ss$rdf)

The lower.tail=FALSE argument gives you the complement (upper-tail) value of the CDF; log.p=TRUE says you want the log value; adding the log(2) makes it two-sided.

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • Thanks for this - although I'm not savvy enough to really understand what has happened. You are estimating P (using similar methods to that in the summary function), however the value comes out at -39.4 (in an example I have been using before, where the P has always come out as 0). Exponentiating this gives 1e-17 - so is this the 'more precise' value you refer to, because it is a two-sided P-value? Thanks again. – Luke Aug 22 '12 at 13:32
  • Yes. If you use the method I suggested (and exponentiate it), the answers should match the answers given by `summary` up to the relevant precision. The reason I computed on the log scale is that for **very** small values (< 1e-308), the p-values will underflow to zero in R no matter what you do (see `?.Machine`) -- keeping them on the log scale avoids this. If all you want to do is *rank* the p-values, the log p-values will give you the same result. – Ben Bolker Aug 22 '12 at 13:36
  • Great - this works very well. I get the same significance estimates for the values that I can see normally, and get actual values for those that the summary function would normally round to 0. Many thanks indeed. – Luke Aug 24 '12 at 11:04