Power law fitted by `fitdistr()` function in package `fitdistrplus`

Question

I generate some random variables using rplcon() function in package poweRlaw

data <- rplcon(1000,10,2)

Now, I want to know which known distributions fit the data best. Lognorm? exp? gamma? power law? power law with exponential cutoff?

So I use function fitdist() in package fitdistrplus:

fit.lnormdl <- fitdist(data,"lnorm")
fit.gammadl <- fitdist(data, "gamma", lower = c(0, 0))
fit.expdl <- fitdist(data,"exp")

Due to the power law distribution and power law with exponential cutoff are not the base probability function according to CRAN Task View: Probability Distributions, so I write the d,p,q function of power law based on the example 4 of ?fitdist

dplcon <- function (x, xmin, alpha, log = FALSE) 
{
    if (log) {
        pdf = log(alpha - 1) - log(xmin) - alpha * (log(x/xmin))
        pdf[x < xmin] = -Inf
    }
    else {
        pdf = (alpha - 1)/xmin * (x/xmin)^(-alpha)
        pdf[x < xmin] = 0
    }
    pdf
}
pplcon <- function (q, xmin, alpha, lower.tail = TRUE) 
{
    cdf = 1 - (q/xmin)^(-alpha + 1)
    if (!lower.tail) 
        cdf = 1 - cdf
    cdf[q < round(xmin)] = 0
    cdf
}
qplcon <- function(p,xmin,alpha) alpha*p^(1/(1-xmin))

Finally, I use codes below to get parameter xmin and alpha of power law:

fitpl <- fitdist(data,"plcon",start = list(xmin=1,alpha=1))

But it throws an error:

<simpleError in optim(par = vstart, fn = fnobj, fix.arg = fix.arg, obs = data,     ddistnam = ddistname, hessian = TRUE, method = meth, lower = lower,     upper = upper, ...): function cannot be evaluated at initial parameters>
Error in fitdist(data, "plcon", start = list(xmin = 1, alpha = 1)) : 
  the function mle failed to estimate the parameters, 
                with the error code 100

I try to search in google and stackoverflow, and so many similar error questions appear, but after reading and trying, no solutions work in my issues, what should I do to complete it correctly to get the parameters? Thank you for everyone who does me a favor!

score 5 · Answer 1 · edited Apr 13 '17 at 12:57

This was an interesting one that I am not entirely happy with the discovery but I will tell you what I have found and see if it helps.

On calling the fitdist function, by default it wants to use mledist from the same package. This itself results in a call to stats::optim which is a general optimization function. In it's return value it gives a convergence error code, see ?optim for details. The 100 you see is not one of the ones returned by optim. So I pulled apart the code for mledist and fitdist to find where that error code comes from. Unfortunately it is defined in more than one case and is a general trap error code. If you break down all of the code, what fitdist is trying to do here is the following, subject to various checks etc beforehand.

fnobj <- function(par, fix.arg, obs, ddistnam) {
  -sum(do.call(ddistnam, c(list(obs), as.list(par), 
                           as.list(fix.arg), log = TRUE)))
}

vstart = list(xmin=5,alpha=5)
fnobj <- function(par, fix.arg obs, ddistnam) {
  -sum(do.call(ddistnam, c(list(obs), as.list(par), 
                           as.list(fix.arg), log = TRUE)))
}
ddistname=dplcon
fix.arg = NULL
meth = "Nelder-Mead"
lower = -Inf
upper = Inf
optim(par = vstart, fn = fnobj, 
      fix.arg = fix.arg, obs = data, ddistnam = ddistname, 
      hessian = TRUE, method = meth, lower = lower, 
      upper = upper)

If we run this code we find a more useful error "function cannot be evaluated at initial parameters". Which makes sense if we look at the function definition. Having xmin=0 or alpha=1 will yield a log-likelihood of -Inf. OK so think try different initial values, I tried a few random choices but all returned a new error, "non-finite finite-difference value 1".

Searching the optim source further for the source of these two errors they are not part of the R source itself, there is however a .External2 call so I can only assume the errors come from there. The non-finite error implies that one of the function evaluations somewhere gives a non numeric result. The function dplcon will do so when alpha <= 1 or xmin <= 0. fitdist lets you specify additional arguments that get passed to mledist or other (depending on what method you choose, mle is default) of which lower is one for controlling lower bounds on the parameters to be optimized. So I tried imposing these limits and trying again:

fitpl <- fitdist(data,"plcon",start = list(xmin=1,alpha=2), lower = c(xmin = 0, alpha = 1))

Annoyingly this still gives an error code 100. Tracking this down yields the error "L-BFGS-B needs finite values of 'fn'". The optimization method has changed from the default Nelder-Mead as you specifying the boundary and somewhere on the external C code call this error arises, presumably close to the limits of either xmin or alpha where the stability of the numerical calculation as we approach infinity is important.

I decided to do quantile matching rather than max likelihood to try to find out more

fitpl <- fitdist(data,"plcon",start = list(xmin=1,alpha=2),
    method= "qme",probs = c(1/3,2/3))
fitpl
## Fitting of the distribution ' plcon ' by matching quantiles 
## Parameters:
##          estimate
## xmin   0.02135157
## alpha 46.65914353

which suggests that the optimum value of xmin is close to 0, it's limits. The reason I am not satisfied is that I can't get a maximum-likelihood fit of the distribution using fitdist however hopefully this explanation helps and the quantile matching gives an alternative.

Edit:

After learning a little more about power law distributions in general it makes sense that this does not work as you expect. The parameter power parameter has a likelihood function which can be maximised conditional on a given xmin. However no such expression exists for xmin since the likelihood function is increasing in xmin. Typically estimation of xmin comes from a Kolmogorov--Smirnov statistic, see this mathoverflow question and the d_jss_paper vignette of the poweRlaw package for more info and associated references.

There is functionality to estimate the parameters of the power law distribution in the poweRlaw package itself.

m = conpl$new(data)
xminhat = estimate_xmin(m)$xmin
m$setXmin(xminhat)
alphahat = estimate_pars(m)$pars
c(xmin = xminhat, alpha = alphahat)

wow, how knowledgeable you are, I get what you mean and thank you! Another question, if I want to draw the raw data and fitted line in the same plot using `ggplot2`, how do I write codes? — Ling Zhang, May 11 '16 at 08:54
look at the `stat_smooth` help section at http://docs.ggplot2.org/current/, If the answer above answers your question, please accept it so that future searches can see that it has been solved. — jamieRowen, May 11 '16 at 09:00
sir, after several tries, I found that `qme` method is so sensitive to the `probs`, it is not a good idea — Ling Zhang, May 12 '16 at 01:06
fair enough, updated answer after some more research into power laws — jamieRowen, May 12 '16 at 20:37

Power law fitted by `fitdistr()` function in package `fitdistrplus`

1 Answers1