9

I have a dataset that I would like to fit to a known probability distribution. The intention is to use the fitted PDF in a data generator - such that I can sample data from the known (fitted) PDF. Data will be used for simulation purposes. At the moment I am just sampling from a normal distribution, which is inconsistent with the real-data, therefore simulation results are not accurate.

I first wanted to use the following method : Fitting empirical distribution to theoretical ones with Scipy (Python)?

My first thought was to fit it to a weibull distribution, but the data is actually multimodal (picture attached). So I guess I need to combine multiple distributions and then fit the data to the resulting dist, is that right ? Maybe combine a gaussian AND a weibull distirbution ?

How can I use the scipy fit() function with a mixed/multimodal distribution ?

Also I would want to do this in Python (i.e. scipy/numpy/matplotlib), as the data generator is written in Python.

Many thanks !

histogram of data

Community
  • 1
  • 1
Rosh
  • 263
  • 1
  • 3
  • 6

1 Answers1

14

I would suggest Kernel Density Estimation (KDE). It gives you a solution as a mixture of PDF.

SciPy has only Gaussian kernel (which lookes fine for your specific histogram), but you can find other kernels in the statsmodels or scikit-learn packages.

For reference, those are the relevant functions:

from sklearn.neighbors import KernelDensity
from scipy.stats import gaussian_kde
from statsmodels.nonparametric.kde import KDEUnivariate
from statsmodels.nonparametric.kernel_density import KDEMultivariate

A great resource for KDE in Python is here.

Elad Joseph
  • 2,998
  • 26
  • 41
  • 4
    Thank you Elad for your answer. I think a KDE would give me a good fit to my data. However, how do I represent the fitted KDE curve as a mathematical equation ? for example a polynomial fitted curve can be expressed as f(x) = x^2 + x + 1 (example). Is it possible to represent the KDE obtained via `stats.gaussian_kde` as a formula ? So I can put it on paper for others to reproduce/reuse. Thank you! – Rosh Oct 25 '15 at 15:28
  • Seriously a great resource – O.rka Oct 21 '16 at 17:27
  • I would like to do something similar (look [here](https://stackoverflow.com/questions/44934808/fitting-multimodal-distrubtion)). I am looking for a method which estimates parameters of a number of probability distributions - I am rather certain that there is a quite simple solution to it - maybe you know one? – Stefan Falk Jul 05 '17 at 19:52