0

I want to plot a gaussian mixed distribution where I have all the given values that I should need, but somehow they combine to one distribution, I'm not sure where I'm going wrong.

I tried using the solution presented in this question but they didn't plot the GMD as a convex combination of two distributions, but instead took random samples from one or the other distributions. This is what I tried to solve that, but it looks like the plot just has a single gaussian distribution;

import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import random as rnd

theta = np.array([[3.91973221e-05, 2.59889568e-04], [5.32160367e-06, 4.99763548e-06],[6.65158426e-01, 3.34841574e-01]])

n = 100000
number_of_distributions = 2
mu = theta[0]
sigma = theta[1]
weights = theta[2]
samples = []

for i in range(n):
       population = [rnd.gauss(mu[i], np.sqrt(sigma[i])) for i in range(number_of_distributions)]
       samples.append(rnd.choices(population, weights=weights))
sns.distplot(samples)
plt.show()

enter image description here

Basically I want it to be a little more clear in the graph that it's two separate gaussian distributions, and by observing the plot someone who hasn't seen the code should be able to distinguish how big the difference is between the two distributions.

armara
  • 535
  • 3
  • 17

1 Answers1

1

I would say your code is fine. The problem is that your distributions are very similar and overlap much. You can see a little asymmetry on the top of your pdf function, the rest is pretty similar and thou invisible.

You have two options:

Option 1

If you can change your data a little bit, try to shift one of the means:

This is for mu[0]=0.91973221e-02

Gaussian Mixture Distribution using distplot

Option 2

If you have to work with that data set and cannot change mu or sigma, you can play with the bandwidth parameter bw of the kde function (and increase the number of bins of your histogram). For some bw values the pdf function is not that smooth so you can see two peaks of your distributions:

sns.distplot(samples, bins=400, kde_kws={"bw": 0.004})
plt.xlim(-0.015, 0.015)

Don't forget to adjust the xlim parameter of your plot. Sometimes it looks strange for small bw values:

Different band width values of the kde plot

..just to be sure

You used sqrt of your sigma parameters when generating the random numbers. If sigma means standard deviation in your preset, use it directly.

Anton
  • 4,544
  • 2
  • 25
  • 31