6

I am working with a problem that routinely needs to compute the density of the t distribution rather far in the tails in R.

For example, using R's t distribution function, dt(1.424781, 1486, -5) returns [1] 2.75818e-10. Some of my final outputs (using this density as an input) do not match a reference value from analogous computations performed in MATLAB by my colleague, which I think may be due to imprecision in the t distribution's tails in R.

If I compare to MATLAB's t distribution function, nctpdf(1.424781, 1486, -5) returns ans = 4.3932e-10, which is quite a bit different than R's output.

edit: R prints two identical warning messages

In dt(1.424781, 1486, -5) : full precision may not have been achieved
in 'pnt{final}'

This is on Mac, R version 3.3.1

lmo
  • 37,904
  • 9
  • 56
  • 69
  • 2
    Ugh.. I am getting the same result as the OP. What he doesn't mention is that the `dt` in question prints out a warning (two identical warnings for me): "In dt(1.424781, 1486, -5) : full precision may not have been achieved in 'pnt{final}'" This warning is present in [this post](http://stackoverflow.com/questions/17069766/full-precision-may-not-have-been-achieved-in-qbeta) on qbeta. I using openSuse 13.1 and R 3.3.1 Patched (2016-08-04 r71028). I'll perform an update right now and see if the result changes. – lmo Aug 27 '16 at 18:01
  • @Hack-R True, it could be the other way around. Any advice on how to check which is the more accurate? – Alexander Etz Aug 27 '16 at 18:01
  • 2
    @lmo Thanks, the warning makes it much more clear. OK, so, it seems this effects the upper tail only I believe, based on this https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16680 – Hack-R Aug 27 '16 at 18:19
  • 2
    @AlexanderEtz Now that we have the warning I think we can just focus on R, but if you wanted to check which one is correct I'd recommend referencing a value from some highly lauded textbook that's been around for many editions. – Hack-R Aug 27 '16 at 18:20
  • @lmo Yea, that sounds good, give it a go – Hack-R Aug 27 '16 at 18:32

2 Answers2

5

It appears that the issue comes from the algorithm that R implements for the non-central t-distribution for this case. Two factors combine to produce the result:

  1. the supplied non-centrality parameter, -5, is an "extreme value."

From the Note section of the help file, ?pt,

The code for non-zero ncp is principally intended to be used for moderate values of ncp: it will not be highly accurate, especially in the tails, for large values.

So the algorithm that calculates these values is not intended to calculate extreme values like -5. We can see this by reducing the ncp value to a more moderate level, say -1:

dt(1.424781, 1486, -1)
[1] 0.0211143
  1. The requested value is in the upper tail of the distribution:

The Source section of ?pt says

For the non-central case of pt based on a C translation of

Lenth, R. V. (1989). Algorithm AS 243 — Cumulative distribution function of the non-central t distribution, Applied Statistics 38, 185–189.

This computes the lower tail only, so the upper tail suffers from cancellation and a warning will be given when this is likely to be significant.

For example, the same ncp value, -5 with the negative of the x value returns

dt(-1.424781, 1486, -5)
[1] 0.0006719519

without a warning.

Community
  • 1
  • 1
lmo
  • 37,904
  • 9
  • 56
  • 69
3

You could use Boost (available through the BH package) via Rcpp as an alternative:

// [[Rcpp::depends(BH)]]

#include <Rcpp.h>
#include <boost/math/distributions/non_central_t.hpp> 

using namespace boost::math;

// [[Rcpp::export]]
double dnct(const double x, const double df, const double ncp) {
  non_central_t dist(df, ncp);
  return pdf(dist, x);
}


/*** R
dnct(1.424781, 1486, -5)
*/

This returns:

[1] 4.393078e-10

I don't know if Boost or Matlab is more precise here, but results are at least similar. The boost docs give some information regarding precision.

Roland
  • 127,288
  • 10
  • 191
  • 288