5

Based on formulas given in the Mathematica UUPDE database I've plotted the hazard function for the standard normal distribution in R.

It seems to be correct in certain range; the numerical issues occur for larger values, see attached figure. Below is the complete R code.

Any comments would be very appreciated.

PDF, CDF, HF and SF plots for standard normal distribution

PDF = function(x) {  1/(sqrt(2*pi))*exp(-x^2/2) }
erf <- function(x) 2 * pnorm(x * sqrt(2)) - 1
erfc <- function(x) 2 * pnorm(x * sqrt(2), lower = FALSE)
CDF = function(x) {  1/2 * (1 + erf(x/(sqrt(2)))) }
HF = function(x) { sqrt(2/pi)/(exp(x^2/2)*(2-erfc(-x/sqrt(2)))) }
SF = function(x) { 1 - 1/2 *erfc(-x/sqrt(2)) }

par(mar=c(3,3,1.5,0.5), oma=c(0,0,0,0), mgp=c(2,1,0))
par(mfrow = c(2, 2))

x = seq(from = -4,to = 10,by = .001)

##### PDF
a = PDF(x)
plot(x,a,'l',main='',ylab="PDF",xlab="x")
grid(nx = NULL,ny = NULL,col = "grey",lty = "dotted",lwd = par("lwd"),equilogs = TRUE)

##### CDF
a = CDF(x)
plot(x,a,'l',main='',ylab="CDF",xlab="x")
grid(nx = NULL,ny = NULL,col = "grey",lty = "dotted",lwd = par("lwd"),equilogs = TRUE)

##### HF
a = HF(x)
plot(x,a,'l',main='',ylab="HF",xlab="x")
grid(nx = NULL,ny = NULL,col = "grey",lty = "dotted",lwd = par("lwd"),equilogs = TRUE)

##### SF
a = SF(x)
plot(x,a,'l',main='',ylab="SF",xlab="x")
grid(nx = NULL,ny = NULL,col = "grey",lty = "dotted",lwd = par("lwd"),equilogs = TRUE)
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
mjs
  • 150
  • 1
  • 20
  • 1
    What is your question? Do you need arbitrary precision or are you interested in a specific range only? – Roland Sep 15 '16 at 11:47
  • 1
    Side note: it's interesting to me that you opted to use the built in normal cdf function to get the erf function and then use that to bake your own cdf function. – Dason Sep 15 '16 at 11:58
  • @Roland I was merely interested in a correct hazard function plot in general and wondered what went wrong – mjs Sep 15 '16 at 12:36
  • @Dason, admittedly it's a quite inefficient way to use R on my side. – mjs Sep 15 '16 at 13:01

1 Answers1

11

The hazard function is the density function divided by the survivor function. The problem with your code is that you are taking this definition at face value and doing a simple division operation; when both the numerator and the denominator are very small values (on the order of 1e-300), which happens in the tail of the distribution, this operation becomes numerically unstable. For this kind of problem, the more appropriate solution is to compute the logarithms of the numerator and denominator (which are moderate-sized negative numbers rather than tiny numbers), subtract the log-denominator from the log-numerator, then exponentiate.

R provides all the pieces you need to do this calculation. You can get the survivor function via pnorm(x,lower=FALSE); you can get the density and the survivor functions on the log scale by using log=TRUE and log.p=TRUE in dnorm() and pnorm() respectively. So:

HF <- function(x) {
   exp(dnorm(x,log=TRUE)-pnorm(x,lower=FALSE,log.p=TRUE))
}
curve(HF,from=-4,to=10)

enter image description here

This strategy can be generalized to compute the hazard function for any distribution provided the log-density and log-survivor functions are available (in general for a distribution foo R provides density function dfoo and CDF pfoo which can be substituted above).

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • Thanks, very useful comments! – mjs Sep 15 '16 at 12:38
  • 1
    While the sentiment is appreciated, StackOverflow deprecates [using comments to say "thank you"](http://meta.stackoverflow.com/questions/258004/should-thank-you-comments-be-flagged?lq=1); if this answer was useful you can upvote it (if you have sufficient reputation), and in any case if it answers your question satisfactorily you are encouraged to click the check-mark to accept it. – Ben Bolker Sep 15 '16 at 12:39