-1

I'm trying to reproduce the example given in this answer to fit the distribution of my real data, that for instance df['Note'] is one column of my dataframe, that contains the average note for every student:

Index     StudentName     Note
0          Mark            3.7
1          Bryan           3.5
2          Nil             0.1
3          Amanda          2.045
4          Arthur          1.2
5          Helen           2.5

The real dataframe contains about 350000 of rows, with the mean = 2.17, as in this one

SO my code to obtain the fitted distribution

def fit(dataframe, path):
    dataframe.set_index
    size=len(dataframe.index)
    x=dataframe.index
    y=dataframe['Note']
    plt.hist(y, bins=range(20))
    dist_names = ['alpha', 'beta', 'norm', 'expon']
    for dist_name in dist_names:
        dist = getattr(scipy.stats, dist_name)
        param = dist.fit(y)
        pdf_fitted = dist.pdf(x, *param[:-2], loc=param[-2], scale=param[-1]) * size
        plt.plot(pdf_fitted, label=dist_name)
        plt.xlim(0, 19)
        plt.legend(loc='upper left')
    plt.show()
    plt.savefig(path+'_fit.png', bbox_inches='tight', dpi=100)


def call_fit(pathname):
    path_picture="%spicture//" %pathname
    path="%data//" %pathname
    path_s=sorted(os.listdir(path))
    for i in path_s:
        file_path=os.path.join(path, i)
        picture=os.path.join(path_picture, i)
        df=pd.read_csv(file_path, sep='\t')
        fit(df, picture)

but the output is very strange. I put here two functions, because I'm no more sure which is cause of the bug! Needless to say, that the distribution have a strange appearance, but moreover the histogram of my real distribution wasn't plot, but every iteration something strange happens with legend: have a look and compare the first plot and the 10th one:

First iteration 10th iteration

your help and advices will be very appreciated! because I have no idea where I could be wrong

Community
  • 1
  • 1
Amanda
  • 835
  • 2
  • 9
  • 17
  • @SaulloCastro I've notices you are specialist of matplotlib and scipy, and I reproduced your code on my example , could you help me with my problem? – Amanda Mar 04 '16 at 06:31

1 Answers1

1

The strange appearance of the distribution is likely due to your data not fitting well the distribution (in particular alpha distribution). Since I do not have your data I can't go into further detail. It looks good enough with the sample you provided.

The histograms are plotted but the bars are just too small compared to 1e29 so you can't see them.

The legend and other plot elements are accumulating because you are always plotting in the same subplot without clearing it so in each iteration you keep everything you plotted before.

Stop harming Monica
  • 12,141
  • 1
  • 36
  • 56