2

I have fit a distribution to my data using scipy.stats.lognormal, and now I am trying to plot the distribution. I have generated the fit to my data with seaborn:

ax = sns.distplot(1 - clint_unique_cov_filter['Identity'], kde=False, hist=True, 
                  norm_hist=True, fit=lognorm, bins=np.linspace(0, 1, 500))
ax.set_xlim(0, 0.1)

Which gets me the fit I expect:

seaborn fit

I need to use the parameters of this distribution for further analysis, but first I wanted to verify I understood the terms. This post shows me that I want to do the following transformations to turn the output of lognorm.fit to get the standard mu and sigma parameters for a lognormal:

shape, loc, scale = lognorm.fit(1 - clint_unique_cov_filter['Identity'])
mu = np.log(scale)
sigma = shape

But when I try to plot this, I do not get the distribution I expect. To double check, I tried just sticking the original values back into a plot, but the distribution is noticeably different:

s, l, sc = lognorm.fit(1 - clint_unique_cov_filter['Identity'])
rv = lognorm(s, l, sc)
plt.plot(np.linspace(0, 0.1), rv.pdf(np.exp(np.linspace(0, 0.1))))

incorrect distribution

Why is this distribution not the same as the one seaborn produces?

EDIT:

Reading the seaborn code led me to my answer:

params = lognorm.fit(1 - clint_unique_cov_filter['Identity'])
xvals = np.linspace(0, 0.1)
pdf = lambda x: lognorm.pdf(xvals, *params)
yvals = pdf(xvals)
plt.plot(xvals, yvals)

This provides the correct plot:

plot following seaborn's method of fitting

Community
  • 1
  • 1
Ian Fiddes
  • 2,821
  • 5
  • 29
  • 49
  • Generally you want to use `floc=0` in `scipy.stats.lognorm.fit()`; see, for example, http://stackoverflow.com/questions/32507117/fitting-and-plotting-lognormal/32507756#32507756, http://stackoverflow.com/questions/26406056/a-lognormal-distribution-in-python/26442781#26442781, and probably a few others if you search for `[scipy] lognorm`. – Warren Weckesser Oct 12 '16 at 03:06
  • Does `seaborn` use floc=0? I tried it both ways, neither provides the distribution `seaborn` produced. – Ian Fiddes Oct 12 '16 at 03:40
  • I guess not, but from [the docstring](https://stanford.edu/~mwaskom/software/seaborn/generated/seaborn.distplot.html), it looks like you could use the argument `fit_kws=dict(floc=0)`. (I haven't tried it.) – Warren Weckesser Oct 12 '16 at 03:57
  • Are you sure the plot produced by `distplot` is what you expect? It looks like the graph is cut off at the y axis, which suggests it would continue into negative x range before hitting 0. It looks like `distplot`'s fit resulted in a negative value of the `loc` parameter. – Warren Weckesser Oct 12 '16 at 04:00
  • See the edits. You have a valid point about the loc parameter. I may want to set that 0. However, that is not possible with the `fit_kws` argument to `distplot` - that dictionary only effects the plotting parameters. https://github.com/mwaskom/seaborn/blob/14078fe9b4bb0b6b6fc957d6dfa0d18dc0adbbef/seaborn/distributions.py#L232-L246 – Ian Fiddes Oct 12 '16 at 04:02
  • 1
    I see. So to use `floc=0`, you would have to create an object with a `fit` method that called `lognorm.fit` with `floc=0`. – Warren Weckesser Oct 12 '16 at 04:07
  • Yup, it appears that way. This is probably the way to go, in order to get the best fit. – Ian Fiddes Oct 12 '16 at 04:08
  • Ian, if you answer your own question and accept the answer, the answer will be easier to find *and* your question will no longer appear unanswered. – Ulrich Stern Oct 15 '16 at 23:47

0 Answers0