4

I would like to generate a sample from a log-normal distribution with Pareto tail in R. Can somebody help me? Thanks.

stochazesthai
  • 617
  • 1
  • 7
  • 20
  • Does http://stackoverflow.com/q/14048401/602276 help you? – Andrie Apr 22 '14 at 15:31
  • Not really: I know how to generate a random sample from a log-normal. The problem is how to generate a random sample from a log-normal with Pareto tail. – stochazesthai Apr 22 '14 at 15:33
  • Have you looked at [this paper](http://www.math.uvic.ca/faculty/reed/dPlN.3.pdf), esp. Eqn. 8,9, and 23? – jlhoward Apr 22 '14 at 16:49

1 Answers1

11

I'm not sure this is what you're looking for, but there is a fair amount of literature on the topic of Double Pareto Log-normal Distributions, or so-ca.led dPlN. See this, and this, and this. These are used to simulate everything from the distribution of earnings and income, to oil field size, to internet traffic.

If this is not what you're looking for, let me know and I'll delete the post.

You ask how to generate a random sample from dPlN. Theoretically, it is possible to generate a random sample from an arbitrary distribution by generating a random sample from the uniform distribution, U[0,1] and transforming that using the inverse CDF of the target distribution.

So first, we need the PDF of the dPlN, then we integrate that to find the CDF, then we invert that to find the inverse CDF. The PDF of dPlN is given in the first reference in Eqn 8 and 9:

where α and β are the location parameters and ν and τ2 are the mean and variance of the log-normal distribution. Φ and Φc are the CDF and complementary CDF of N[0,1]. Crudely, smaller α and β mean longer tail, ν influences the position of the peak, and τ influences the width of the peak.

So in R, we calculate the PDF, CDF, and inverse CDF of dPlN as follows:

f = function(x,alpha, beta, nu, tau) {   # probability density of dPlN
  A = function(theta, nu, tau) exp(theta*nu +(alpha*tau)^2/2)
  c = alpha*beta/(alpha+beta)
  z.alpha = (log(x) - nu - alpha*tau^2)/tau
  z.beta  = (log(x) - nu + beta*tau^2)/tau
  t.alpha = x^-(alpha+1)*A(alpha,nu,tau)*pnorm(z.alpha)
  t.beta  = x^(beta-1)*A(-beta,nu,tau)*(1-pnorm(z.beta))
  return(c*(t.alpha + t.beta))
}
F = function(x,alpha,beta,nu,tau) {      # cumulative density function of dPlN
  ifelse(x > 1e4, 1, integrate(f,0.001,x,alpha,beta,nu,tau)$value)}
F = Vectorize(F, vectorize.args="x")

F.inv = function(y, alpha,beta,nu,tau){  # inverse CDF of dPlN
  uniroot(function(x, alpha,beta,nu,tau){F(x, alpha,beta,nu,tau)-y},
          interval=c(0,1e6),alpha,beta,nu,tau)$root
}
F.inv = Vectorize(F.inv, vectorize.args="y")

x=seq(0,50,length.out=1000)
y=seq(0,.995,length.out=1000)

par(mfrow=c(1,3))
plot(x,f(x,2,2,2,1),type="l",main="f(x)")
plot(x,F(x,2,2,2,1),type="l",main="CDF of f(x)")
plot(y,F.inv(y,2,2,2,1),type="l",main="Inverse CDF of f(x)")

Finally, we use F.inv(...) to generate random variates of the dPlN, and plot the results to demonstrate that the random sample does indeed follow the expected probability distribution.

# random sample from dPlN (double Pareto Lognormal distribution)
X = runif(1000,0,1)   # random sample from U[0,1]
Z = F.inv(X,2,2,2,1)

par(mfrow=c(1,1))
hist(Z, breaks=c(seq(min(x),max(x),length=50),Inf), 
     xlim=range(x), freq=FALSE)
lines(x,f(x,2,2,2,1),main="Density function",
      xlim=range(x), col="red", lty=2)

Disclaimer This code has not been tested with all possible values of alpha, beta, nu, and tau, so there are no guarantees it will work under all circumstances.

jlhoward
  • 58,004
  • 7
  • 97
  • 140
  • Thank you a billion! That's exactly what I was looking for! I owe you a beer, man. :-) – stochazesthai Apr 23 '14 at 05:34
  • 2
    Wow! All that excellent codilicious work and then no upvote or checkmark? Arrgh. I wish I could click around around with magic reputation dust and sprinkle it about. – IRTFM May 28 '14 at 00:59
  • I understand this is a continuation of this work: https://stat.ethz.ch/pipermail/r-help/2013-November/363156.html Were the issues noted in this discussion addressed in the above code? Also, how might one fit a dPlN PDF to data? MLE or NLS? It would be nice to use these with the fitdistrplus package. – Adam Erickson Dec 09 '15 at 02:50
  • I meant only in the sense that it is tackling the challenge of implementing the equations of the Reed paper. – Adam Erickson Dec 10 '15 at 16:32