1

Objective

Find the best fitted distribution for data and then generate random numbers from this distribution.

Example

Using the gamlss package in R, I found that the best fit distribution is "Skew exponential power (Azzalini type 1)":

library(gamlss)
library(gamlss.dist)
library(gamlss.add)


m1 <- fitDist(mtcars$wt, k = 2, type = "realAll", trace = FALSE, try.gamlss = TRUE)

  
summary(m1)
# *******************************************************************
#   Family:  c("SEP1", "Skew exponential power (Azzalini type 1)") 
# 
# Call:  gamlssML(formula = y, family = DIST[i]) 
# 
# Fitting method: "nlminb" 
# 
# 
# Coefficient(s):
#   Estimate   Std. Error     t value Pr(>|t|)
# eta.mu     3.440000000  0.000149516 23007.56651  < 2e-16
# eta.sigma -1.856665040  0.861826733    -2.15434 0.031214
# eta.nu     0.150728244           NA          NA       NA
# eta.tau   -3.524272086           NA          NA       NA
# 
# eta.mu    ***
#   eta.sigma *  
#   eta.nu       
# eta.tau      
# ---
#   Signif. codes:  
#   0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# Degrees of Freedom for the fit: 4 Residual Deg. of Freedom   28 
# Global Deviance:     -27.4747 
# AIC:     -19.4747 
# SBC:     -13.6117 
# Warning message:
#   In sqrt(diag(object$vcov)) : NaNs produced

Error

But the values of sigma and tau are negative. When I provide these values to rSEP1() to generate a random number, it throws the following error:

rSEP1(1, mu = 3.44, sigma = -1.8, nu = 0.15, tau = -3.5)
# Error in rSEP1(1, mu = 3.44, sigma = -1.8, nu = 0.15, tau = -3.5) : 
#   sigma must be positive

Are these values transformed? How can I provide correct inputs to rSEP1()?

umair durrani
  • 5,597
  • 8
  • 45
  • 85
  • 1
    If you run `gamlssML(mtcars$wt, family=SEP1)` you will get a warning about a convergence problem so maybe the coefficients are just meaningless. – Robert Hacken Mar 14 '23 at 21:52
  • @RobertHacken thanks for your comment. Could you please elaborate on why the coefficients *maybe* meaningless? Is there a way to find out if they are definitely meaningless? I see this warning: `possible convergence problem: optim gave code=1 false convergence` – umair durrani Mar 14 '23 at 23:02
  • 1
    It appears I was overly suspicious, thinking that the fitting algorithm, not converging properly, might have returned coefficients that were not valid for the given distribution. However, after reading Allan's nice answer I see that this is not the case and the coefficients are perfectly meaningful when transformed correctly (albeit not much reliable). – Robert Hacken Mar 15 '23 at 14:49

1 Answers1

3

If you look at the link functions for the parameters, you will see:

SEP1()

#> GAMLSS Family: SEP1 Skew exponential power (Azzalini type 1) 
#> Link function for mu   : identity 
#> Link function for sigma: log 
#> Link function for nu   : identity 
#> Link function for tau  : log 

So the values for sigma and tau that fitDist returns are the log of the numbers you would plug into rSEP1. In other words, the correct way to run rSEP1 from your model is:

rSEP1(1, mu = 3.44, sigma = exp(-1.8), nu = 0.15, tau = exp(-3.5))
#> [1] 3.460991

To show that this is the case, let us create a set of random numbers from a defined rSEP1 where sigma = 0.5, and tau = 0.5. If we then find the best-fitting distribution, we should get results close to log(0.5) for both sigma and tau. Since log(0.5) is about -0.69, we would expect values of about -0.69 for both these parameters:

set.seed(1)
testvec <- rSEP1(10000, mu = 3.5, sigma = 0.5, nu = 2.5, tau = 0.5)
m2 <- fitDist(testvec, k = 2, type = "realAll", trace = FALSE, try.gamlss = TRUE)

m2
#> Family:  c("SEP1", "Skew exponential power (Azzalini type 1)") 
#> Fitting method: "nlminb" 
#> 
#> Call:  gamlssML(formula = y, family = DIST[i]) 
#> 
#> Mu Coefficients:
#> [1]  3.5
#> Sigma Coefficients:
#> [1]  -0.6828
#> Nu Coefficients:
#> [1]  2.421
#> Tau Coefficients:
#> [1]  -0.6952

And plugging the exponents of sigma and tau into dSEP1 gives us a near-perfect fit to the density of our test vector:

d <- density(testvec)
plot(d$x, d$y)
lines(d$x, dSEP1(d$x, mu = 3.5, sigma = exp(-0.68), nu = 2.42, tau = exp(-0.69)),
     col = "red", lwd = 2, lty = 2)

enter image description here

It's worth pointing out that the actual fit obtained for mtcars$wt in the example is pretty terrible. The small value of sigma jyst means that most random values drawn from the distribution will be close to the mean of mtcars$wt. There are only 32 points in the data set, which makes it very difficult to automatically fit a parametric distribution accurately.

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87