0

I have a well behaved data-set, and I try to fit this data-set with two Gaussian fitting. My code for fitting this data-set is:

from scipy import stats  
from pylab import *
from scipy.optimize import curve_fit
from scipy.integrate import*
from lmfit import Model

data=concatenate((normal(1,.2,5000),normal(2,.2,2500)))
y,x,_=hist(data,100,alpha=.3,label='data')    
x=(x[1:]+x[:-1])/2


def gaussian1(x, amp1, cen1, wid1):
    "1-d gaussian: gaussian(x, amp, cen, wid)"
    return (amp1/(sqrt(2*pi)*wid1)) * exp(-(x-cen1)**2 /(2*wid1**2))

def gaussian2(x, amp2, cen2, wid2):
    "1-d gaussian: gaussian(x, amp, cen, wid)"
    return (amp2/(sqrt(2*pi)*wid2)) * exp(-(x-cen2)**2 /(2*wid2**2))

    
gmodel = Model(gaussian1) + Model(gaussian2)
#pars  = gmodel.make_params( amp1=20,cen1=1.0,wid1=0.2,amp2=10,cen2=2.0,wid2=0.19)

pars  = gmodel.make_params(amp1=260,cen1=1.0,wid1=0.2,amp2=140,cen2=2.0,wid2=0.19)

result = gmodel.fit(y, pars, x=x)    
print(result.fit_report())        

#plt.plot(x, y, 'bo')
plt.plot(x, result.init_fit, 'k--',label='Initial Fit')
plt.plot(x, result.best_fit, 'r-',label='Best Fit')
plt.show()

This returns the below output and plot:

[[Model]]
    (Model(gaussian1) + Model(gaussian2))
[[Fit Statistics]]
    # fitting method   = leastsq
    # function evals   = 29
    # data points      = 100
    # variables        = 6
    chi-square         = 8297.41156
    reduced chi-square = 88.2703358
    Akaike info crit   = 453.852870
    Bayesian info crit = 469.483891
[[Variables]]
    amp1:  122.899857 +/- 1.52937166 (1.24%) (init = 260)
    cen1:  1.00027783 +/- 0.00288716 (0.29%) (init = 1)
    wid1:  0.20129987 +/- 0.00290241 (1.44%) (init = 0.2)
    amp2:  61.0231508 +/- 1.51470406 (2.48%) (init = 140)
    cen2:  1.99916357 +/- 0.00564876 (0.28%) (init = 2)
    wid2:  0.19744994 +/- 0.00567828 (2.88%) (init = 0.19)
[[Correlations]] (unreported correlations are < 0.100)
    C(amp2, wid2) =  0.581
    C(amp1, wid1) =  0.581
    C(wid1, wid2) = -0.103

enter image description here

The plot for this fit looks fine. But the problem is that the value of the two amplitude values amp1, amp2 do not match those shown in the plot:

According to the plot amp1=255 and amp2 = 125

According to the printed values amp1 = 122.89 and amp2 = 61.02

Another thing is that it returns a very high chi-square value (e.g., 8297.41).

Please help me fix the value of amp1, amp2 and chi-square value and solve these issues.

Mr. T
  • 11,960
  • 10
  • 32
  • 54
sd_Dhara.45
  • 101
  • 3

3 Answers3

3

First things first - pylab is disapproved by matplotlib and import * is discouraged due to namespace cluttering that may inadvertently overwrite other functions. Your imports should be

from lmfit import Model
import numpy as np
from matplotlib import pyplot as plt

Now to your actual problem. You defined your model amplitudes as ampN/(np.sqrt(2*np.pi)*widN), not as ampN so you have to calculate the actual amplitude A shown in the figure accordingly:

A1 = result.params["amp1"].value/(np.sqrt(2*np.pi)*result.params["wid1"].value)
A2 = result.params["amp2"].value/(np.sqrt(2*np.pi)*result.params["wid2"].value)

Et voila, these are indeed the amplitudes you see in the plot.

Mr. T
  • 11,960
  • 10
  • 32
  • 54
1

Do you have a reason to believe that on the Y-axis amp(s) are getting plotted and not the sum of two gaussians?

The fit is sublime. On the Y-axis it is not the amp that is present but a value of the addition of gaussian1 and gaussian2 with their parameters. I just fed the values you have given in the output and plugged into the following equation and plotted it.

RecreatedVals = gaussian1(x, 122.899, 1.000,0.201)+gaussian2(x, 61.023, 1.999, 0.197)

The fitting procedure worked out well. Recreated values and best fit are superimposed.

As you can see the recreated values and your best fit are on top of each other. The fitting worked out well. You need not worry.

The edited code in case you need, is as below.

from scipy import stats  
from pylab import *
from scipy.optimize import curve_fit
from scipy.integrate import*
from lmfit import Model, minimize

data=concatenate((normal(1,.2,5000),normal(2,.2,2500)))
y,x,_=hist(data,100,alpha=.3,label='data')

x=(x[1:]+x[:-1])/2



def gaussian1(x, amp1, cen1, wid1):
    "1-d gaussian: gaussian(x, amp, cen, wid)"
    return (amp1/(sqrt(2*pi)*wid1)) * exp(-(x-cen1)**2 /(2*wid1**2))

def gaussian2(x, amp2, cen2, wid2):
    "1-d gaussian: gaussian(x, amp, cen, wid)"
    return (amp2/(sqrt(2*pi)*wid2)) * exp(-(x-cen2)**2 /(2*wid2**2))



gmodel = Model(gaussian1) + Model(gaussian2)
#pars  = gmodel.make_params( amp1=20,cen1=1.0,wid1=0.2,amp2=10,cen2=2.0,wid2=0.19)

pars  = gmodel.make_params(amp1=260,cen1=1.0,wid1=0.2,amp2=140,cen2=2.0,wid2=0.19)

result = gmodel.fit(y, pars, x=x)

print(result.fit_report())

RecreatedVals = gaussian1(x, 122.899, 1.000,0.201)+gaussian2(x, 61.023, 1.999, 0.197)

#plt.plot(x, y, 'bo')
plt.plot(x, result.init_fit, 'k--',label='Initial Fit')
plt.plot(x, result.best_fit, 'r-',label='Best Fit')
plt.plot(x, RecreatedVals, 'g--', label="Back Calculated Values")
plt.show()
Amit
  • 2,018
  • 1
  • 8
  • 12
1

Just to echo and re-phrase what both @Amit and @MrT said:

The amplitude parameter here is not the maximum height of the Gaussian. It is the amplitude of the unit-normalized Gaussian - the area under the curve if you like. In fact, if you had used the builtin GaussianModel from lmfit it would have reported not only this amplitude but also the height and fwhm parameters:

import numpy as np
import matplotlib.pyplot as plt
from lmfit.models import GaussianModel

np.random.seed(1)

data = np.concatenate((np.random.normal(1,.2,5000),
                       np.random.normal(2,.2,2500)))

y, x, _ = plt.hist(data,100,alpha=.3,label='data')    
x = (x[1:]+x[:-1])/2

gmodel = GaussianModel(prefix='p1_') + GaussianModel(prefix='p2_')
params = gmodel.make_params(p1_amplitude=20, p1_center=1.0, p1_sigma=0.2,
                            p2_amplitude=20, p2_center=2.0, p2_sigma=0.2)

result = gmodel.fit(y, params, x=x)    
print(result.fit_report())        

plt.plot(x, result.init_fit, 'k--',label='Initial Fit')
plt.plot(x, result.best_fit, 'r-',label='Best Fit')
plt.show()

which would have given

[[Model]]
    (Model(gaussian, prefix='p1_') + Model(gaussian, prefix='p2_'))
[[Fit Statistics]]
    # fitting method   = leastsq
    # function evals   = 36
    # data points      = 100
    # variables        = 6
    chi-square         = 8131.43279
    reduced chi-square = 86.5046041
    Akaike info crit   = 451.832224
    Bayesian info crit = 467.463245
[[Variables]]
    p1_amplitude:  116.269963 +/- 1.46784977 (1.26%) (init = 20)
    p1_center:     1.00574290 +/- 0.00290097 (0.29%) (init = 1)
    p1_sigma:      0.19941410 +/- 0.00291806 (1.46%) (init = 0.2)
    p2_amplitude:  57.9229780 +/- 1.45262882 (2.51%) (init = 20)
    p2_center:     1.98961370 +/- 0.00564475 (0.28%) (init = 2)
    p2_sigma:      0.19531569 +/- 0.00567660 (2.91%) (init = 0.2)
    p1_fwhm:       0.46958430 +/- 0.00687150 (1.46%) == '2.3548200*p1_sigma'
    p1_height:     232.606458 +/- 2.93016223 (1.26%) == '0.3989423*p1_amplitude/max(2.220446049250313e-16, p1_sigma)'
    p2_fwhm:       0.45993329 +/- 0.01336736 (2.91%) == '2.3548200*p2_sigma'
    p2_height:     118.310650 +/- 2.96047824 (2.50%) == '0.3989423*p2_amplitude/max(2.220446049250313e-16, p2_sigma)'
[[Correlations]] (unreported correlations are < 0.100)
    C(p1_amplitude, p1_sigma) =  0.581
    C(p2_amplitude, p2_sigma) =  0.581
    C(p1_sigma, p2_sigma)     = -0.107
M Newville
  • 7,486
  • 2
  • 16
  • 29