10

I have a problem with fitdistr{MASS} function in R. I have this vector:

a <- c(26,73,84,115,123,132,159,207,240,241,254,268,272,282,300,302,329,346,359,367,375,378, 384,452,475,495,503,531,543,563,594,609,671,687,691,716,757,821,829,885,893,968,1053,1081,1083,1150,1205,1262,1270,1351,1385,1498,1546,1565,1635,1671,1706,1820,1829,1855,1873,1914,2030,2066,2240,2413,2421,2521,2586,2727,2797,2850,2989,3110,3166,3383,3443,3512,3515,3531,4068,4527,5006,5065,5481,6046,7003,7245,7477,8738,9197,16370,17605,25318,58524)

and I want to fit a gamma distribution to the data with a command:

fitted.gamma <- fitdistr(a, "gamma")

but I have such error:

Error in optim(x = c(26, 73, 84, 115, 123, 132, 159, 207, 240, 241, 254,  : 
non-finite finite-difference value [1]
In addition: Warning messages:
1: In densfun(x, parm[1], parm[2], ...) : NaNs produced
2: In densfun(x, parm[1], parm[2], ...) : NaNs produced
3: In densfun(x, parm[1], parm[2], ...) : NaNs produced
4: In densfun(x, parm[1], parm[2], ...) : NaNs produced

So I tried with initializing the parameters:

(fitted.gamma <- fitdistr(a, "gamma", start=list(1,1)))

The object fitted.gamma is created but when printed, creates an error:

Error in dn[[2L]] : subscript out of bounds

Do you know what is happening or maybe know some other R functions to fit univariate distributions by MLE?

Thanks in advance for any help or response.

Kuba

ProgramFOX
  • 6,131
  • 11
  • 45
  • 51
kuba
  • 1,005
  • 2
  • 11
  • 16

2 Answers2

11

Always plot your stuff first, you scaling is far offfffffff.

library(MASS)
a <- c(26,73,84,115,123,132,159,207,240,241,254,268,272,282,300,302,329,346,359,367,375,378, 384,452,475,495,503,531,543,563,594,609,671,687,691,716,757,821,829,885,893,968,1053,1081,1083,1150,1205,1262,1270,1351,1385,1498,1546,1565,1635,1671,1706,1820,1829,1855,1873,1914,2030,2066,2240,2413,2421,2521,2586,2727,2797,2850,2989,3110,3166,3383,3443,3512,3515,3531,4068,4527,5006,5065,5481,6046,7003,7245,7477,8738,9197,16370,17605,25318,58524)
## Ooops, rater wide
plot(hist(a))
fitdistr(a/10000,"gamma") # gives warnings
# No warnings
fitted.gamma <- fitdistr(a/10000, dgamma,  start=list(shape = 1, rate = 0.1),lower=0.001)

Now you can decide what to do with the scaling

Dieter Menne
  • 10,076
  • 44
  • 67
  • Thanks for your reply. I see that adding "lower" argument with scaling made the trick. Does that mean that while optimizing gamma parameters took at some point negative values? When it comes to scaling, why is it necessary to scale the values (rate parameter is to low)? Kuba – kuba Apr 12 '13 at 15:49
  • Yes, during gradient optimization we easily run into bad gradient regions for some samples. Gamma might not be the right distribution, just try to plot a few examples. However, log(a) looks almost normal... – Dieter Menne Apr 12 '13 at 16:01
1

For data that clearly fits the gamma distribution, but is on the wrong scale (i.e., as if it had been multiplied/divided by a large number), here's an alternative approach to fitting the gamma distribution:

fitgamma <- function(x) {
  # Equivalent to `MASS::fitdistr(x, densfun = "gamma")`, where x are first rescaled to 
  # the appropriate scale for a gamma distribution. Useful for fitting the gamma distribution to 
  # data which, when multiplied by a constant, follows this distribution
  if (!requireNamespace("MASS")) stop("Requires MASS package.")

  fit <- glm(formula = x ~ 1, family = Gamma)
  out <- MASS::fitdistr(x * coef(fit), "gamma")
  out$scaling_multiplier <- unname(coef(fit))
  out
}

Usage:

set.seed(40)
test <- rgamma(n = 100, shape = 2, rate = 2)*50000
fitdistr(test, "gamma") # fails
dens_fit <- fitgamma(test) # successs
curve(dgamma(x, 2, 2), to = 5) # true distribution
curve(dgamma(x, dens_fit$estimate['shape'], dens_fit$estimate['rate']), add=TRUE, col=2) # best guess
lines(density(test * dens_fit$scaling_multiplier), col = 3)

plot of density

jwdink
  • 4,824
  • 5
  • 18
  • 20