3

I'm wondering if there is a good way to match a Gaussian normal to a histogram in the form of a numpy array np.histogram(array, bins).

How can such a curve been plotted on the same graph and adjusted in height and width to the histogram?

JohanC
  • 71,591
  • 8
  • 33
  • 66
Sam Sloan
  • 69
  • 5
  • 5
    I don't think any "fitting" is needed: the normal distribution [is defined by the mean and th standard deviation of your data](https://en.wikipedia.org/wiki/Normal_distribution), so you just plug those unto the formula for the PDF and plot it. – ForceBru Mar 22 '20 at 16:22
  • 1
    See also [this](https://stackoverflow.com/questions/59738337/how-to-draw-a-matching-bell-curve-over-a-histogram/59742545#59742545) and [this post](https://stackoverflow.com/questions/60091790/how-to-plot-the-density-of-states-using-histogram-with-a-curve-that-follows-the/60100773#60100773) about fitting a gaussian normal and a kde to a histogram – JohanC Mar 22 '20 at 20:33
  • @ForceBru Well, simply plotting the curve will not match it to the histogram. Some rescaling is needed. If the bins don't have an equal width, it even wouldn't be possible to match them. – JohanC Mar 22 '20 at 21:02
  • Also have a look at Seaborn's [`sns.distplot(array, kde_kws={'shade': True, 'color':'r'})`](https://seaborn.pydata.org/generated/seaborn.distplot.html). This scales down the histogram to fit the kde. – JohanC Mar 22 '20 at 21:18

1 Answers1

2

You can fit your histogram using a Gaussian (i.e. normal) distribution, for example using scipy's curve_fit. I have written a small example below. Note that depending on your data, you may need to find a way to make good guesses for the starting values for the fit (p0). Poor starting values may cause your fit to fail.

import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
from scipy.stats import norm

def fit_func(x,a,mu,sigma,c):
    """gaussian function used for the fit"""
    return a * norm.pdf(x,loc=mu,scale=sigma) + c

#make up some normally distributed data and do a histogram
y = 2 * np.random.normal(loc=1,scale=2,size=1000) + 2
no_bins = 20
hist,left = np.histogram(y,bins=no_bins)
centers = left[:-1] + (left[1] - left[0])

#fit the histogram
p0 = [2,0,2,2] #starting values for the fit
p1,_ = curve_fit(fit_func,centers,hist,p0,maxfev=10000)

#plot the histogram and fit together
fig,ax = plt.subplots()
ax.hist(y,bins=no_bins)
x = np.linspace(left[0],left[-1],1000)
y_fit = fit_func(x, *p1)
ax.plot(x,y_fit,'r-')
plt.show()

Histogram with Gaussian fit

Andrew
  • 5,375
  • 3
  • 17
  • 12