0

I need to fit a curve to my histogram. It just shows me the histogram not the curve fitted. This is my code:

from scipy.stats import norm
import matplotlib.pyplot as plt
import numpy as np
import matplotlib
import matplotlib.mlab as mlab



f= np.loadtxt('data Ties', unpack='False')


bins = [0,1000,10000,20000,30000,40000,50000,60000,70000,80000,90000,100000] 


(mu, sigma) = norm.fit(f)

#plt.hist(f, bins=bins, histtype='bar')

plt.hist(f, bins=bins, histtype='bar', normed=True)

y = mlab.normpdf( bins, mu, sigma)
plt.plot(bins,y,'r--', linewidth=1, color='r')

plt.xlabel('Diameter (Micrometer)')
plt.ylabel('Number of Chondrules')
plt.title('Distribution of chondrules diameter')
plt.legend()
plt.grid(True)
plt.show()

It plots the fit but not in a nice way. This is a piece of my data:

168000
199300
120900
216900
200800
137800
214200
174600
48200
126500
58700
149500
47500
5600
178500
25400
163000
182000
51900
66700
90300
210600
117800
164000
215200
170000
182000
38800
72700
161200
31000

I also want to know mean and sigma.

enter image description here

Yasamin
  • 11
  • 2
  • 5
  • 1
    see http://stackoverflow.com/questions/7805552/fitting-a-histogram-with-python – Louis Jun 02 '16 at 12:53
  • 2
    A cruise through the [Matplotlib Gallery](http://matplotlib.org/examples/statistics/histogram_demo_features.html) turned up an example that fits a curve to a histogram. – wwii Jun 02 '16 at 14:35
  • What exactly is it that you don't find nice about the plot? As I stated in my answer, the bins are off, and you are only plotting around half of your distribution. If that is intentional, then it is probably as nice as it will get. If the problem is not seeing the entire distribution, then adjust the bins as shown below. If the problem is that the normal pdf does not really fit the histogram, then that is a matter of your data not really looking normally distributed. Try kernel density estimates. – Martin Stancsics Jun 02 '16 at 14:36
  • I looked at these links and did the same, but still the pdf does not well fit the histogram. My data are fine I don't know what do you mean that the problem is in the data. – Yasamin Jun 03 '16 at 12:35
  • You are using bins ranging from 0 to 100,000. At the same time there are values in your distribution way above 100,000. It means that you are not getting the full picture. What you are seeing is only half of the distribution. Based on the sampe data: You are plotting [the left side](http://imgur.com/yfTtr8m). If you adjust the bins you get the [full distribution](http://imgur.com/P8Kmx1T). However the line does not fit, as the data does not look normally distributed, while the line is the estimated *normal* distribution. If you want better fit [use a KDE](http://imgur.com/4fajZVO). – Martin Stancsics Jun 03 '16 at 19:15

1 Answers1

0

The pdf is on the graph, it is just really small. Just look at the values of y:

array([  8.83232194e-07,   9.10034504e-07,   1.17854794e-06,
         1.53635735e-06,   1.95663517e-06,   2.43444572e-06,
         2.95912259e-06,   3.51397321e-06,   4.07667940e-06,
         4.62048215e-06,   5.11611925e-06,   5.53435032e-06])

Sou you have to scale your histogram such that it is comparable to the pdf (or the other way around). The easiest way to do it is to set the normed option to True in plt.hist():

plt.hist(f, bins=bins, histtype='bar', normed=True)

and you should be set. Unfortunately the graph will still not look good, as the bin sizes you choose are not particularly good for this dataset. For example the maximum of your bins is still below the mean of the data. Try

bins = np.arange(0, 260000, 20000)

for a slightly better looking graph.

If you want a really nice distribution plot, I suggest trying a specialized package, for example seaborn, which makes it a one-liner:

import seaborn as sns
sns.distplot(f, fit=norm, kde=False)
plt.show()

or if you want to fit a kernel density estimate instead of a (not particularly well-fitting) normal distribution it is as simple as

sns.distplot(f)
plt.show()
Martin Stancsics
  • 370
  • 1
  • 11