Exponential Fit: optimize.curve_fit and stats.expon.fit produce different results

Question

I am trying to fit histograms to an exponential distribution using two different methods based on the answers I have read here. I am interested in obtaining the inverse of the scale parameter of the distribution.

Following the answer given here (Histogram fitting with python), I use the fit method of the scipy.stats.expon distribution.

import glob
import numpy as np
import scipy.stats as ss
import matplotlib.pyplot as plt
import seaborn as sns

fig, ax = plt.subplots(5, 1, sharex = True)
j = 0

for files in glob.glob("data_*"):

    time = []
    hist = []

    with open(files, 'r') as f:
         for line in f:
             line = line.split(' ')
             time.append(float(line[0]))
             H.append(float(line[1]))

    P  = ss.expon.fit(H, floc = 0)
    T  = np.linspace(0,200, 1000)
    rP = ss.expon.pdf(T, *P)

    ax[j].plot(T, rP, lw = 3.0)
    ax[j].hist(H,bins = 30, alpha = 0.6, label = r"$\lambda = $" + str(1/P[1]), density = True, stacked = True)
    ax[j].set_yticks([])
    ax[j].legend()

    j = j +1 

sns.despine(top = True, left = True, right = True)
plt.xlabel("Time")
plt.show()

By doing so, I obtain the following plot:

The fit looks nice, but I would like to know the uncertainty/error lambda value. There is no information about how to get this in the stats.expon documentation.

This question has already been asked here (Is there a way to get the error in fitting parameters from scipy.stats.norm.fit?). The accepted answer suggested using curve_fit to fit the histogram instead. Therefore, following the tutorial here (https://riptutorial.com/scipy/example/31081/fitting-a-function-to-data-from-a-histogram), I tried using curve_fit. Here is the modified code (I have inserted these lines instead of using scipy.stats.expon):


    def func(x, a):
        return a*np.exp(-a*x)

    bins = np.linspace(0, 200, 201)
    data_entries, bins = np.histogram(np.array(H), bins = bins)
    binscenters = np.array([0.5 * (bins[i] + bins[i + 1]) for i in range (len(bins)-1)])
    popt, pcov = curve_fit(func, xdata = binscenters, ydata = data_entries)

    ax[j].plot(T, func(T, *popt))
    ax[j].hist(H, bins = 30, alpha = 0.6, label = r"$\lambda = $" + str(popt[0]), density = True, stacked = True)

This fit produces results that are very different from stats.expon.fit, and that seem to (at least qualitatively) fit the data worse.

Am I using curve_fit incorrectly? I believe that in some limit, curve_fit and expon.fit should produce the same results. Is there a way I could get the error in the estimated lambda from expon.fit? I am thinking of computing the relative error between the mean of the data and the lambda estimated from the initial fit, but I don't know if this would be correct. Any hint would be greatly appreciated.

In the first code example, the `hist` function misses the data (if I read correctly). In the second, you do an histogram over data that is called "hist", is that intended? — Pierre de Buyl, Apr 28 '20 at 09:30
@PierredeBuyl follow what I saw in the examples linked, so it is indeed deliberate. However, I don't know if this is the reason for such different fits. — ma7642, Apr 28 '20 at 09:41
The first issue (no data passed to hist) is not found in the links. As the data is not included, I cannot check the code myself. I suggest that you provide a [mcve] so that I (or others) can check your code directly. Else, our suggestions will be useless. — Pierre de Buyl, Apr 28 '20 at 09:53
One way to make such an example without uploading data is to generate suitable random data in your code. — Pierre de Buyl, Apr 28 '20 at 09:54
Sorry, I understand how it could be misleading; I actually misunderstood your first comment and there was a typo in my question. I have corrected it now. The histogram in the first example doesn't miss the data, this was my mistake. — ma7642, Apr 28 '20 at 10:00
@PierredeBuyl I was following your advice to produce suitable random data and realised that my problem was a mathematical one, rather than a code issue. Thank you for the suggestion! — ma7642, May 05 '20 at 14:51

score 1 · Accepted Answer · answered May 05 '20 at 14:50

I managed to solve my problem. It turns out I am lacking density = True on numpy.histogram.

The function

def func(x, a):
        return a*np.exp(-a*x)

is the exponential PDF. Since my data was not normalized (therefore, not a PDF), the fit using curve_fit was incorrect. With this modification, both ss.expon.fit and curve_fit produce the same lambda value.

Exponential Fit: optimize.curve_fit and stats.expon.fit produce different results

1 Answers1