1

In docs.scipy.org there's a code to generate Pareto distribution. I could understand most of the code snippet except the usage of term 'fit' for PDF(probability Density Function) and the formula: max(count)*fit/max(fit)

Here's the code snippet:

import matplotlib.pyplot as plt
a, m = 3., 2.  # shape and mode
s = (np.random.pareto(a, 1000) + 1) * m
count, bins, _ = plt.hist(s, 100, normed=True)
fit = a*m**a / bins**(a+1)
plt.plot(bins, max(count)*fit/max(fit), linewidth=2, color='r')
plt.show()

I thoroughly searched the web for the formula: max(count)*fit/max(fit) Even replaced the term 'fit' with pdf. But could not get any leads. Kindly explain the concept of what the formula is conveying.

I assumed the term 'fit' is used instead of PDF as they are using the formula of PDF for Pareto distribution for fit.

Finally, what does the underscore '_' in the code convey:

count, bins, _ = plt.hist(s, 100, normed=True)
Bipin
  • 453
  • 7
  • 12
  • 2
    The ```_``` signals that that value isn't important. ```plt.hist``` would return three values and the last one isn't important – Joshua Nixon Mar 09 '20 at 14:19

1 Answers1

1

np.random.pareto draws random samples from the Pareto-II distribution. The resulting data is therefore realisations from this distribution, rather than the probability density of the distribution.

In the call to plt.hist we use the normed=True argument. This normalises the data and plots the density of our samples on the y-axis, rather than the frequency.

We then wish to fit a pareto distribution to our randomly sampled data and plot this distribution on top of our data.

To do so we begin by computing the probability density of the pareto distribution at the x-values defined by bins with parameters a and m. This is our definition of fit: fit = a*m**a / bins**(a+1).

The necessity of the max(count) * fit / max(fit) term is a little more elusive. I think it's clear why we'd include fit in the plotting command, but why the ratio max(count) / max(fit)? Actually, I'm not 100% sure.

max(count) / max(fit) looks like it could be a bias correction from fitting the pareto distribution to our data.

jwalton
  • 5,286
  • 1
  • 18
  • 36
  • Thank you @Ralph, you explained it really well. However, you did not mention about max(count)*fit/max(fit). – Bipin Mar 09 '20 at 14:31
  • @Bipin Apologies, I missed the ```max(count) * fit / max(fit)``` bit. I can't quite work out why it's necessary. I'll ask a colleague. – jwalton Mar 09 '20 at 15:15
  • I must thank you wholeheartedly for your efforts and for your time. – Bipin Mar 09 '20 at 15:39
  • @Bipin I'd encourage posting this question to [cross validated](https://stats.stackexchange.com/). I can't seem to work this one out – jwalton Mar 09 '20 at 16:03
  • I posted in cross validated. Thank you for suggesting me to post. – Bipin Mar 09 '20 at 16:13