2

Let's assume we're having a linear combination of two normal distributions. I think one would call the result a multimodal distribution.

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

ls = np.linspace(0, 60, 1000)

distribution = norm.pdf(ls, 0, 5) + norm.pdf(ls, 20, 10)
distribution = (distribution * 1000).astype(int)
distribution = distribution/distribution.sum()

plt.plot(ls, distribution)

enter image description here

As you can see, we are having a linear combination of two normal distributions having parameters (mu1 = 0, s1 = 5) and (mu2 = 20, s2 = 10). But of course, we usually do not know these parameters beforehand.

I would like to know how I can estimate or fit those parameters (mus and sigmas). I am confident there are methods that would allow to do this but I couldn't find any yet.

riyansh.legend
  • 117
  • 1
  • 13
Stefan Falk
  • 23,898
  • 50
  • 191
  • 378

2 Answers2

3

The problem that you describe is a special case of Gaussian Mixture model. In order to be able to estimate these parameters, you need to have some samples. If you don't have samples but you are given the curve, you could produce some samples based on the curve. Then you can use Expectation–maximization algorithm to estimate the parameters. Scikit-learn has a method that enables you to do that: sklearn.mixture.GaussianMixture. You just need to provide your samples, the number of components (n_components) which is 2 in your case, and a covariance type, which would be full in your case, as you have no prior assumptions on the covariance matrix.

Miriam Farber
  • 18,986
  • 14
  • 61
  • 76
  • Ah! I knew that I should already know that! I was looking for maximum likelihood methods but somehow I failed to look that up! Thanks, this should work :) – Stefan Falk Jul 05 '17 at 20:43
  • Hey :) I am having a follow up question to my original one. Maybe you [want to take a look](https://stats.stackexchange.com/questions/289490/how-can-i-model-such-a-distribution-consisting-of-a-mix-of-different-distributio) at it? – Stefan Falk Jul 08 '17 at 11:31
2

You might want to use the Expectation Maximization algorithm.

It is an iterative approach that allows you to fit a model of mixture components. There is a very convenient implementation in scikit-learn: GaussianMixture

I found it hard to figure out how to structure the data for this algorithm to work, so I set up a sample for you: https://nbviewer.jupyter.org/gist/lhk/e566e2d6b67992eca062f9d96e2a14a2

lhk
  • 27,458
  • 30
  • 122
  • 201
  • Is there a chance you can [help out here](https://stats.stackexchange.com/questions/289490/how-can-i-model-such-a-distribution-consisting-of-a-mix-of-different-distributio) as well? The question is a follow up from this one. I didn't consider "border" cases. – Stefan Falk Jul 08 '17 at 11:32