5

I have written the below code to fit a Gaussian curve to a histogram. It seems to work, although the Y scaling is different. What am I doing wrong?

import matplotlib.pyplot as plt
import numpy as np
import matplotlib.mlab as mlab

list = [0,1,1,2,2,2,3,3,4]

plt.figure(1)
plt.hist(list)
plt.xlim((min(list), max(list)))

mean = np.mean(list)
variance = np.var(list)
sigma = np.sqrt(variance)
x = np.linspace(min(list), max(list),100)
plt.plot(x,mlab.normpdf(x,mean,sigma))

plt.show()

Thanks!

El Confuso
  • 1,941
  • 6
  • 19
  • 22

1 Answers1

14

You need to normalize the histogram, since the distribution you plot is also normalized:

import matplotlib.pyplot as plt
import numpy as np
import matplotlib.mlab as mlab

arr = np.random.randn(100)

plt.figure(1)
plt.hist(arr, density=True)
plt.xlim((min(arr), max(arr)))

mean = np.mean(arr)
variance = np.var(arr)
sigma = np.sqrt(variance)
x = np.linspace(min(arr), max(arr), 100)
plt.plot(x, mlab.normpdf(x, mean, sigma))

plt.show()

Note the density=True in the call to plt.hist. Note also that I changed your sample data, because the histogram looks weird with too few data points.

If you instead want to keep the original histogram and rather adjust the distribution, you have to scale the distribution such that the integral over the distribution equals the integral of the histogram, i.e. the number of items in the list multiplied by the width of the bars. This can be achieved like

import matplotlib.pyplot as plt
import numpy as np
import matplotlib.mlab as mlab

arr = np.random.randn(1000)

plt.figure(1)
result = plt.hist(arr)
plt.xlim((min(arr), max(arr)))

mean = np.mean(arr)
variance = np.var(arr)
sigma = np.sqrt(variance)
x = np.linspace(min(arr), max(arr), 100)
dx = result[1][1] - result[1][0]
scale = len(arr)*dx
plt.plot(x, mlab.normpdf(x, mean, sigma)*scale)

plt.show()

Note the scale factor calculated from the number of items times the width of a single bar.

Tomás Ferrer
  • 55
  • 1
  • 6
David Zwicker
  • 23,581
  • 6
  • 62
  • 77
  • Thanks for that, it works well. Is there a way that rather than normalizing the histogram to the distribution, I could just not normalize the distribution in the first place? I'm very much a beginner at Python, let alone MatPlotLib. – El Confuso May 03 '14 at 17:59
  • 1
    You can of course multiply the distribution by the total number of items divided by the number of bins in the histogram. That should get you what you want. – David Zwicker May 03 '14 at 19:34
  • Can I trouble you for an example please? – El Confuso May 03 '14 at 20:25
  • See the second example in my answer. – David Zwicker May 04 '14 at 08:41
  • 1
    `list = np.random.randn(1000)` this code is redefining the list() keyword. This is an awful practice and will create really hard to debug problems. – Lidia Freitas Nov 26 '16 at 15:07
  • `matplotlib.pyplot.hist` doesn't have a parameter called `normed` anymore (it was deprecated in matplotlib version 3.0.2). It's been replaced with the `density` parameter.. – Tomás Ferrer Jun 01 '21 at 20:57
  • It work, but normpdf is deprecated. Now you have to import >>> import scipy >>> from scipy.stats import norm and update the plt.plot to: >>>plt.plot(x, norm.pdf(x, mean, sigma)*scale) – Diving Aug 23 '21 at 14:39