0

I have two data sets where two values where measured. I am interested in the difference between the value and the standard deviation of the difference. I made a histogram which I would like to fit two normal distributions. To calculate the difference between the maxima. I also would like to evaluate the effect that in on data set I have much less data on one value. I've already looked at this link but it is not really what I need: Python: finding the intersection point of two gaussian curves

enter image description here

enter image description here

for ii in range(2,8):
   # Kanal = ii - 1
    file = filepath + '\Mappe1.txt'
    data = np.loadtxt(file, delimiter='\t', skiprows=1)
    data = data[:,ii]
    plt.hist(data,bins=100)
    plt.xlabel("bins")
    plt.ylabel("Counts")
    plt.tight_layout()
    plt.grid()
    plt.figure()

plt.show()
Community
  • 1
  • 1
Greg.P
  • 61
  • 1
  • 2
  • 5

2 Answers2

0

Quick and dirty fitting can be readily achieved using scipy:

from scipy.optimize import curve_fit #non linear curve fitting tool
from matplotlib import pyplot as plt

def func2fit(x1,x2,m_1,m_2,std_1,std_2,height1, height2): #define a simple gauss curve
    return height1*exp(-(x1-m_1)**2/2/std_1**2)+height2*exp(-(x2-m_2)**2/2/std_2**2)

init_guess=(-.3,.3,.5,.5,3000,3000) 
#contains the initial guesses for the parameters (m_1, m_2, std_1, std_2, height1, height2) using your first figure

#do the fitting
fit_pars, pcov =curve_fit(func2fit,xdata,ydata,init_guess) 
#fit_pars contains the mean, the heights and the SD values, pcov contains the estimated covariance of these parameters 

plt.plot(xdata,func2fit(xdata,*fit_pars),label='fit') #plot the fit

For further reference consult the scipy manual page: curve_fit

Fourier
  • 2,795
  • 3
  • 25
  • 39
0

Assuming that the two samples are independent there is no need to handle this problem using curve fitting. It's basic statistics. Here's some code that does the calculations required, with the source attributed in a comment.

## adapted from http://onlinestatbook.com/2/estimation/difference_means.html

from random import gauss
from numpy import sqrt

sample_1 = [ gauss(0,1) for _ in range(10) ]
sample_2 = [ gauss(1,.5) for _ in range(20) ]

n_1 = len(sample_1)
n_2 = len(sample_2)

mean_1 = sum(sample_1)/n_1
mean_2 = sum(sample_2)/n_2

SSE = sum([(_-mean_1)**2 for _ in sample_1]) + sum([(_-mean_2)**2 for _ in sample_2])
df = (n_1-1) + (n_2-1)
MSE = SSE/df

n_h = 2 / ( 1/n_1 + 1/n_2 )
s_mean_diff = sqrt( 2* MSE / n_h )

print ( 'difference between means', abs(n_1-n_2))
print ( 'std dev of this difference', s_mean_diff )
Bill Bell
  • 21,021
  • 5
  • 43
  • 58
  • That Looks great. and it is working fin with most of my data. but other than in you exampel the two Peaks arise from one data file. so I haven tgot sample_1 and sample_2. In the nice cases the Peaks are far enough apart so i can easily split the data and us your method. But what can be done in cases like in the plot I posted, where the Peaks are so close to each other ? – Greg.P Jan 06 '17 at 08:57
  • This is the business of estimating for univariate gaussian mixture models, and I'm nothing like an expert. Beyond that, SO is meant to be about programming. I would suggest a preliminary visit to http://stats.stackexchange.com/ for up-to-date advice and maybe even advice about what software you could use. Best of luck! – Bill Bell Jan 06 '17 at 16:01