Python: two normal distribution

Question

I have two data sets where two values where measured. I am interested in the difference between the value and the standard deviation of the difference. I made a histogram which I would like to fit two normal distributions. To calculate the difference between the maxima. I also would like to evaluate the effect that in on data set I have much less data on one value. I've already looked at this link but it is not really what I need: Python: finding the intersection point of two gaussian curves

for ii in range(2,8):
   # Kanal = ii - 1
    file = filepath + '\Mappe1.txt'
    data = np.loadtxt(file, delimiter='\t', skiprows=1)
    data = data[:,ii]
    plt.hist(data,bins=100)
    plt.xlabel("bins")
    plt.ylabel("Counts")
    plt.tight_layout()
    plt.grid()
    plt.figure()

plt.show()

What is `plt`? Such information is both clearly relevant and missing from your question. — John Coleman, Jan 05 '17 at 13:35
This could become tricky if you want to use gaussian mixture models. Read up on those. — Dschoni, Jan 05 '17 at 14:30

Fourier · Answer 1 · 2017-01-05T14:20:49.140

Quick and dirty fitting can be readily achieved using scipy:

from scipy.optimize import curve_fit #non linear curve fitting tool
from matplotlib import pyplot as plt

def func2fit(x1,x2,m_1,m_2,std_1,std_2,height1, height2): #define a simple gauss curve
    return height1*exp(-(x1-m_1)**2/2/std_1**2)+height2*exp(-(x2-m_2)**2/2/std_2**2)

init_guess=(-.3,.3,.5,.5,3000,3000) 
#contains the initial guesses for the parameters (m_1, m_2, std_1, std_2, height1, height2) using your first figure

#do the fitting
fit_pars, pcov =curve_fit(func2fit,xdata,ydata,init_guess) 
#fit_pars contains the mean, the heights and the SD values, pcov contains the estimated covariance of these parameters 

plt.plot(xdata,func2fit(xdata,*fit_pars),label='fit') #plot the fit

For further reference consult the scipy manual page: curve_fit

score 0 · Answer 2 · answered Jan 05 '17 at 15:04

0

Assuming that the two samples are independent there is no need to handle this problem using curve fitting. It's basic statistics. Here's some code that does the calculations required, with the source attributed in a comment.

## adapted from http://onlinestatbook.com/2/estimation/difference_means.html

from random import gauss
from numpy import sqrt

sample_1 = [ gauss(0,1) for _ in range(10) ]
sample_2 = [ gauss(1,.5) for _ in range(20) ]

n_1 = len(sample_1)
n_2 = len(sample_2)

mean_1 = sum(sample_1)/n_1
mean_2 = sum(sample_2)/n_2

SSE = sum([(_-mean_1)**2 for _ in sample_1]) + sum([(_-mean_2)**2 for _ in sample_2])
df = (n_1-1) + (n_2-1)
MSE = SSE/df

n_h = 2 / ( 1/n_1 + 1/n_2 )
s_mean_diff = sqrt( 2* MSE / n_h )

print ( 'difference between means', abs(n_1-n_2))
print ( 'std dev of this difference', s_mean_diff )

answered Jan 05 '17 at 15:04

Bill Bell

21,021
5
43
58

That Looks great. and it is working fin with most of my data. but other than in you exampel the two Peaks arise from one data file. so I haven tgot sample_1 and sample_2. In the nice cases the Peaks are far enough apart so i can easily split the data and us your method. But what can be done in cases like in the plot I posted, where the Peaks are so close to each other ? – Greg.P Jan 06 '17 at 08:57
This is the business of estimating for univariate gaussian mixture models, and I'm nothing like an expert. Beyond that, SO is meant to be about programming. I would suggest a preliminary visit to http://stats.stackexchange.com/ for up-to-date advice and maybe even advice about what software you could use. Best of luck! – Bill Bell Jan 06 '17 at 16:01

Python: two normal distribution

2 Answers2