2

I am trying to fit multiple Gaussian curves to my experimental data. The Gaussian mixture model was obtained using sci-kit learn Mixture models. The GM fit over my experimental data is shown in the image below.

GMM fit over experimental data.

As you can see multiple Gaussian curves fit my data. However, I just wish to retain the two curves with the highest peak and wish to obtain the parameters of these two Gaussian curves such that I can independently plot these two specific Gaussian curves (Note that the mean and covariance alone is not enough to reproduce them, I also need to know the scaling parameter). Is there a way to do so? I have attached the code below.

import numpy as np
import matplotlib.pyplot as plt
import matplotlib
import matplotlib.ticker as ticker
from sklearn.mixture import GaussianMixture
import random as random
## Generating random data resembling experimental data
C1 = np.zeros(2000)
for c in range(2000):
    if c<=400:
        C1[c] = random.gauss(0.7, 0.2)
    elif c<=600:
        C1[c] = random.gauss(0.9,0.25)
    elif c<= 800:
        C1[c] = random.gauss(2.5,0.2)
    elif c<= 1200:
        C1[c] = random.gauss(1.5, 0.5)
    elif c<=1600:
        C1[c] = random.gauss(5,3.5)
    elif c<2000:
        C1[c] = random.gauss(10, 5)
C1[C1<0] = 0
C1 = np.sort(C1)
#### Plotting a normalised histogram
fig, ax = plt.subplots()
fig.set_figheight(10)
fig.set_figwidth(10)
n, bins, patches = ax.hist(C1, 
                           bins = 250,align = 'mid', density = True,color = 'grey' )

""" Using machine Learning i.e Gaussian mixture models """
### Using GMM to predict different Gaussain Curves
X = np.array(C1)
gmm = GaussianMixture(n_components=6, random_state=0).fit(X.reshape(-1, 1))
labels = gmm.predict(X.reshape(-1,1))
gmm_y = np.exp(gmm.score_samples(X.reshape(-1, 1)))
ax.plot(X.reshape(-1,1), gmm_y, color="crimson", lw=2, label="GMM")
ax.tick_params(labelsize=26)

I found the answer to my question here. Thanks

  • Hi, I'm just learning too, but I'm wonder if the returned `weights_` attribute has the information you are looking for. Perhaps if you show some code more specific help would be easier to provide. – rickhg12hs Jan 06 '21 at 21:44
  • Hi, @rickhg12hs thanks for the comment. Interesting so I have `weights_`, `means_`, and `covariances_`. Do you mean to say the Gaussian curves with the highest weights are the ones with the highest peaks? If it the case then how can reproduce these two Gaussian curves? Yes, I would love to share my code it is quite small though. The only hurdle is that the data file is large and there is no way I can share the .csv file on stackoverflow. – Kanishk Patel Jan 07 '21 at 01:18
  • Just sharing the relevant bits of code to demonstrate the issue/problem/error is usually enough. Do the `weights_` in your example correspond with the highest peaks? – rickhg12hs Jan 07 '21 at 03:58
  • Hi @rickhg12hs thanks for taking the time to reply. I will share the relevant code. I am sorry how can I know which `weights_` refer to which peak? – Kanishk Patel Jan 08 '21 at 06:26
  • It seems that each `.weights_` corresponds to each `.means_`. I.e., `.weights_[0]` corresponds to `.means_[0]`, etc. – rickhg12hs Jan 08 '21 at 08:39
  • 1
    Hi, @rickhg12hs Thank you very much for the help. I actually found an old StackOverflow question that answered my query. – Kanishk Patel Jan 09 '21 at 00:36
  • If you think your question is a duplicate, you can delete it. If you think your question may uniquely help someone in the future, you can answer it too. – rickhg12hs Jan 09 '21 at 03:06
  • 1
    Hi, @rickhg12hs Thanks very much for helping out. My question does have a unique perspective. So I will probably keep it and answer it. Hoping that someone in the future will be benefitted from it. – Kanishk Patel Jan 10 '21 at 18:36

0 Answers0