Fitting empirical distributions using python

Question

I have 255 monthly (~21 years) returns of financial asset that ranges from -22.25% to +18.09%. I am using the code from Fitting empirical distribution to theoretical ones with Scipy (Python)? to fit the data into distribution and generate random numbers.

This is the histogram of the data. I believe the code above tries to fit data into distribution using MLE (maximum likelihood estimation) and there are about 88 different distributions in the list. My question is that, for example, burr distribution is positive random variable only (https://en.wikipedia.org/wiki/Burr_distribution, https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.burr.html).

But when I fit the distribution, get the parameters and make PDF, I get the following result:

which the distribution has both positive and negative values.

To be honest, I don't think I fully understand the code and the implications of fitting distributions. Why would a distribution that is supposed to fit positive values only also fit negative values?

Doesn't `loc=-99.11` push the positive values into negative territory? — JohanC, Aug 27 '20 at 13:43
@JohanC I believe so. loc and scale seem to shift the values, but I am not too sure. Where can I learn more about them? — JungleDiff, Aug 27 '20 at 13:47
Well, you could draw the pdf first with standard values, then changing the parameters one by one and see how the pdf changes. Drawing the standard pdf in one color and the modified with another on the same plot. Keep on experimenting and try to make sense of the documentation at scipy and wikipedia. — JohanC, Aug 27 '20 at 14:35
It's not clear why you want to identify an arbitrary theoretical distribution that can approximate your data sample. Many different distributions will fit reasonably well, but just because one fits doesn't mean your data came from that kind of distribution. If you want new random samples from the same distribution, see [Creating a random number generator for arbitrary distributions](https://people.duke.edu/~ccc14/sta-663-2016/15A_RandomNumbers.html). — TMBailey, Oct 25 '21 at 07:32

erdogant · Answer 1 · 2023-02-02T19:28:56.027

Try the distfit library. It fits the best theoretical distribution based on your empirical data. It returns the loc/scale parameters. You can set the directionality to test for significance (upper/lower bounds). The fitted model can be used to generate new samples.

pip install distfit

# import library
from distfit import distfit

# Lets create some random data for demonstration purposes. Ssuppose that X is your data.
X = np.random.normal(0, 2, 10000)

# Initialize with default settings
model= distfit(bound='both')

# Fit to find the best theoretical distribution 
model.fit_transform(X)

dfit.plot(
          pdf_properties={'color': 'r', 'linewidth': 4},
          bar_properties={'color': '#1e3f5a', 'edgecolor': 'k'})

Disclaimer: I am also the author of this repo.

Fitting empirical distributions using python

1 Answers1