Method for fitting a PDF from an histogram in Python

Question

given is a sample (r) in form of a numpy array. I created a histogram from that sample:

plt.hist(x=r, bins='auto', color='#0504aa',alpha=0.7, rwidth=0.85)
objective_prob, bin_edges = np.histogram(r, bins='auto', density=True)
bin_centers = 0.5*(bin_edges[1:] + bin_edges[:-1])

What I tried to do here was to get the values of a possible PDF, which is supposed to be defined as

def f(x, N):
    return N/k*T*np.exp(-N*x**2/2*k*T)

with x being the sample (r in that case), and N the parameter to be fitted. k and T are constants. It's a Boltzmann distribution.

First question: Is my array "objective_prob" correct in the sense that it correctly gives the values of a Probability Density Function (PDF)? I am asking because I'm unsure if I understood the 'normed=True' argument correctly. Second question: Am I right to use the array bin_centers for my x-axis?

Next, (the important step). I want to do the fit (so that I get parameter N from my function f)

params, params_covariance = curve_fit(f, bin_centers , objective_prob, p0=None)

Here I use my array bin_centers as x-data, and objective_prob as y-data. Now, the value for N is completely off, and if I try to plot it with

plt.figure(figsize=(6, 4))
plt.plot(bin_centers, objective_prob, label="Histogram")
plt.plot(bin_centers, f(bin_centers, params[0]), 'r-')
plt.legend()
plt.show()

I get a straight line for my fitted curve. So third question: Is my curve_fit wrong? Where else could I be wrong? Is it correct to use the respective arrays in my curve_fit, or am I using the bin-centers wrongly?

Any help is greatly appreciated! Thanks.

Lots of questions. What do the numpy docs say about `np.histogram(...density=True)`? Do the results look correct to you? Do the results compare to the function provided? — wwii, Jan 07 '20 at 21:27
Yes, the plotted results do look correct, it's just the fit that gives weird values. And as far as I can tell from the numpy docs, density=True should give the correct values. I asked to make sure my mistake for the curve_fit isn't there. — LionCereals, Jan 07 '20 at 21:30
Related: [Probability density function numpy histogram/scipy stats](https://stackoverflow.com/questions/30326623/probability-density-function-numpy-histogram-scipy-stats) ... [Fitting a histogram with python](https://stackoverflow.com/questions/7805552/fitting-a-histogram-with-python) — wwii, Jan 07 '20 at 21:32
[Fit a curve to a histogram in Python](https://stackoverflow.com/questions/35544233/fit-a-curve-to-a-histogram-in-python), — wwii, Jan 07 '20 at 21:48
[Why does scipy.optimize.curve_fit not fit to the data?](https://stackoverflow.com/questions/15624070/why-does-scipy-optimize-curve-fit-not-fit-to-the-data) — wwii, Jan 07 '20 at 23:46
Do you have a minimal example of `r` that you can share to complete your [mcve]? — wwii, Jan 08 '20 at 02:31
r is an array of 500 entries with coordinates centered around 1mm — LionCereals, Jan 08 '20 at 08:06
If r was an array with 250 entries of 0 and 250 entries of 2, that would be 500 entries centered around 1 - but this would not help answer your question. Would you please post or link to the data in r? — James Phillips, Jan 08 '20 at 23:10

Method for fitting a PDF from an histogram in Python

0 Answers0