What's the equivalent of fitdist and histfit in Python?

Question

--- SAMPLE ---

I have a data set (sample) that contains 1 000 damage values (the values are very small <1e-6) in a 1-dimension array (see the attached .json file). The sample is seemed to follow Lognormal distribution:

--- PROBLEM & WHAT I ALREADY TRIED ---

I tried the suggestions in this post Fitting empirical distribution to theoretical ones with Scipy (Python)? and this post Scipy: lognormal fitting to fit my data by lognormal distribution. None of these works. :(

I always get something very large in Y-axis as the following:

Here is the code that I used in Python (and the data.json file can be downloaded from here):

from matplotlib import pyplot as plt
from scipy import stats as scistats
import json
with open("data.json", "r") as f:
  sample = json.load(f) # load data: a 1000 * 1 array with many small values( < 1e-6)
fig, axis = plt.subplots() # initiate a figure
N, nbins, patches = axis.hist(sample, bins = 40) # plot sample by histogram
axis.ticklabel_format(style = 'sci', scilimits = (-3, 4), axis = 'x') # make X-axis to use scitific numbers
axis.set_xlabel("Value")
axis.set_ylabel("Count")    
plt.show()

fig, axis = plt.subplots()
param = scistats.lognorm.fit(sample) # fit data by Lognormal distribution
pdf_fitted = scistats.lognorm.pdf(nbins, * param[: -2], loc = param[-2], scale = param[-1]) # prepare data for ploting fitted distribution
axis.plot(nbins, pdf_fitted) # draw fitted distribution on the same figure
plt.show()

I tried the other kind of distribution, but when I try to plot the result, the Y-axis is always too large and I can't plot with my histogram. Where did I fail ???

I'have also tried out the suggestion in my another question: Use scipy lognormal distribution to fit data with small values, then show in matplotlib. But the value of variable pdf_fitted is always too big.

--- EXPECTING RESULT ---

Basically, what I want is like this:

And here is the Matlab code that I used in the above screenshot:

fname = 'data.json';
sample = jsondecode(fileread(fname));

% fitting distribution
pd = fitdist(sample, 'lognormal')

% A combined command for plotting histogram and distribution
figure();
histfit(sample,40,"lognormal")

So if you have any idea of the equivalent command of fitdist and histfit in Python/Scipy/Numpy/Matplotlib, please post it !

Thanks a lot !

erdogant · Answer 1 · 2022-03-05T09:09:56.350

6

Try the distfit (or fitdist) library.

https://erdogant.github.io/distfit

pip install distfit

import numpy as np

# Example data
X = np.random.normal(10, 3, 2000)
y = [3,4,5,6,10,11,12,18,20]

# From the distfit library import the class distfit
from distfit import distfit

# Initialize
dist = distfit()

# Search for best theoretical fit on your emperical data
dist.fit_transform(X)

# Plot
dist.plot()

# summay plot
dist.plot_summary()

So in your case it would be:

dist = distfit(distr='lognorm')
dist.fit_transform(X)

edited Mar 05 '22 at 09:09

answered Jun 11 '20 at 22:33

erdogant

1,544
14
23

How do you set floc = 0 using distfit? I couldn't find that in the docs. – Simd Jul 13 '20 at 09:11
There is no input parameter to force this to be a specified value. In case of a normal distribution the location (loc) keyword specifies the mean. The scale keyword specifies the standard deviation. But other distribution use it to shift and/or scale the distribution. So it may not be so straightforward to set these manually. The fun part is that you do not have to worry about these parameters but are estimated for you with distfit. – erdogant Jul 16 '20 at 07:46

score 0 · Answer 2 · answered Jan 23 '19 at 16:38

0

Try seaborn:

import seaborn as sns, numpy as np
sns.set(); np.random.seed(0)
x = np.random.randn(100)
ax = sns.distplot(x)

answered Jan 23 '19 at 16:38

Filipe Aleixo

3,924
3
41
74

Hi, is there a solution that use only the "more common" module like scipy or numpy ? – yoursbh Jan 23 '19 at 17:45

score 0 · Answer 3 · answered Oct 28 '20 at 00:42

I tried your dataset using Openturns library

x is the list given in you json file.

import openturns as ot
from openturns.viewer import View
import matplotlib.pyplot as plt

# first format your list x as a sample of dimension 1
sample = ot.Sample(x,1) 

# use the LogNormalFactory to build a Lognormal distribution according to your sample
distribution = ot.LogNormalFactory().build(sample)

# draw the pdf of the obtained distribution
graph = distribution.drawPDF()
graph.setLegends(["LogNormal"])
View(graph)
plt.show()

If you want the parameters of the distribution

print(distribution)
>>> LogNormal(muLog = -16.5263, sigmaLog = 0.636928, gamma = 3.01106e-08)

You can build the histogram the same way by calling HistogramFactory, then you can add one graph to another:

graph2 = ot.HistogramFactory().build(sample).drawPDF()
graph2.setColors(['blue'])
graph2.setLegends(["Histogram"])
graph2.add(graph)
View(graph2)

and set the boundaries values if you want to zoom

axes = view.getAxes()
_ = axes[0].set_xlim(-0.6e-07, 2.8e-07)
plt.show()

What's the equivalent of fitdist and histfit in Python?

3 Answers3