1

I need a hand to fine tune the plot resulting from my code. the code is really rude, but basically it serves to fit some data, that have two peaks in the ditribution of counts versus time. The rising part of each peak is fitted with a gaussian, and the decaying part with an exponential. I need to fine tuning the fits, cause, as it is clear from the plot, the data are fitted but not in the best way. I need to avoid the discontinuites between the different functions (so the functions have to "touch" each other), and I would like to obtain fits that really follow the data and behave according to their definition (i.e., the first gaussian has not "bell" shape at the peak, and the second gaussian stops "too soon"). The code take the data from the web, so it is directly executable. Hopefully, the code and the image will be clearer than my words. Many thanks in advance.

#!/usr/bin/env python

import pyfits, os, re, glob, sys
from scipy.optimize import leastsq
from numpy import *
from pylab import *
from scipy import *

# ---------------- Functions ---------------------------#
def right_exp(p, x, y, err1):
    yfit1 = p[0]*exp(-p[2]*(x - p[1]))
    dev_exp = (y - yfit1)/err1
    return dev_exp

def left_gauss(p, x, y, err2):
    yfit2 = p[0]*(1/sqrt(2*pi*(p[2]**2)))*exp(-(x - p[1])**2/(2*p[2]**2))
    dev_gauss = (y - yfit2)/err2
    return dev_gauss

# ------------------------------------------------------ #
tmin = 56200
tmax = 56249

data=pyfits.open('http://heasarc.gsfc.nasa.gov/docs/swift/results/transients/weak/GX304-1.orbit.lc.fits')

time  = data[1].data.field(0)/86400. + data[1].header['MJDREFF'] + data[1].header['MJDREFI']
rate  = data[1].data.field(1)
error = data[1].data.field(2)
data.close()

cond1 = ((time > 56200) & (time < 56209)) #| ((time > 56225) & (time < 56234))
time1 = time[cond1]
rate1 = rate[cond1]
error1 = error[cond1]
cond2 = ((time > 56209) & (time < 56225)) #| ((time > 56234) & (time < 56249))
time2 = time[cond2]
rate2 = rate[cond2]
error2 = error[cond2]
cond3 = ((time > 56225) & (time < 56234))
time3 = time[cond3]
rate3 = rate[cond3]
error3 = error[cond3]
cond4 = ((time > 56234) & (time < 56249))
time4 = time[cond4]
rate4 = rate[cond4]
error4 = error[cond4]

totaltime = np.append(time1, time2)
totalrate = np.append(rate1, rate2)
v0= [0.23, 56209.0, 1] #inital guesses for Gaussian Fit, just do it around the peaks
v1= [0.40, 56233.0, 1]

# ------------------------ First peak -------------------------------------------------------------------#
out = leastsq(left_gauss, v0[:], args=(time1, rate1, error1), maxfev = 100000, full_output = 1)
p = out[0]
v = out[0]
xxx = arange(min(time1), max(time1), time1[1] - time1[0])
yfit1 = p[0]*(1/sqrt(2*pi*(p[2]**2)))*exp(-(xxx - p[1])**2/(2*p[2]**2))

out2 = leastsq(right_exp, v0[:], args = (time2, rate2, error2), maxfev = 100000, full_output = 1)
p2 = out2[0]
v2 = out2[0]
xxx2 = arange(min(time2), max(time2), time2[1] - time2[0])
yfit2 = p2[0]*exp(-p2[2]*(xxx2 - p2[1]))

# ------------------------ Second peak -------------------------------------------------------------------#
out3 = leastsq(left_gauss, v1[:], args=(time3, rate3, error3), maxfev = 100000, full_output = 1)
p3 = out3[0]
v3 = out3[0]
xxx3 = arange(min(time3), max(time3), time3[1] - time3[0])
yfit3 = p3[0]*(1/sqrt(2*pi*(p3[2]**2)))*exp(-(xxx3 - p3[1])**2/(2*p3[2]**2))

out4 = leastsq(right_exp, v1[:], args = (time4, rate4, error4), maxfev = 100000, full_output = 1)
p4 = out4[0]
v4 = out4[0]
xxx4 = arange(min(time4), max(time4), time4[1] - time4[0])
yfit4 = p4[0]*exp(-p4[2]*(xxx4 - p4[1]))
# ------------------------------------------------------------------------------------------------------- #

fig = figure(figsize = (9, 9)) #make a plot
ax1 = fig.add_subplot(111)
ax1.plot(time, rate, 'g.')
ax1.plot(xxx, yfit1, 'b-')
ax1.plot(xxx2, yfit2, 'b-')
ax1.plot(xxx3, yfit3, 'b-')
ax1.plot(xxx4, yfit4, 'b-')
axis([tmin, tmax, -0.00, 0.45])
savefig("first peak.png")

first peak.png

DSM
  • 342,061
  • 65
  • 592
  • 494
Py-ser
  • 1,860
  • 9
  • 32
  • 58
  • 1
    `pyfits` is not defined. Also, you should not fit random functions, if you don't have a model that suggests a certain functional form. If you want to improve your data presentation, you might want to look into smoothing of your data. Fitting is usually done to determine model parameters, or to show that a certain model can explain data. – David Zwicker Mar 21 '13 at 15:47
  • Thanks David. 1) What do you mean with: pyfits is not defined? Cannot you run the code? 2) If I can find a model for the data, then I have to try some fit functions... and maybe find the best-fti parameter (if I have properly understood your statement). 3) What do you mean with smoothing my data? Any example? Many thanks again – Py-ser Mar 21 '13 at 15:53
  • Ok, I didn't know `pyfits` was a package and I must have missed it in your import statement. Sorry for that. Otherwise, I doubt that it makes sense to fit any function to data, if you do not have model. What are you trying to achieve with the fit? – David Zwicker Mar 21 '13 at 16:03
  • The fit is to represent the data behavior but avoiding the confusion of too many data points and big error bars, as you can see here: http://stackoverflow.com/questions/15410885/overplot-data-with-multiple-x-axis-in-python. Moreover, it seems that those data have a quite clear exponential decay, and also the derivation of those parameters is important. Finally, the center of the gaussian is important too. – Py-ser Mar 21 '13 at 16:08
  • 1
    Well, those are different requirements. I would probably fit the data piecewise, depending on what you're asking for. Just fit two Gaussians to the maxima to get their position. The exponential tails are most easily seen in a semi-logarithmic plot, where they appear as straight lines. I would thus do a separate plot to investigate them. – David Zwicker Mar 21 '13 at 16:13
  • Thanks again David. However, I need to plot the fit exactly in the way I asked for, for a number of reasons, and for the nature itself of the data (it makes no sense to plot a straight "lightcurve"). – Py-ser Mar 21 '13 at 16:24

1 Answers1

2

Using trigonometric series can handle very well this problem in order to create a continuous function. The example below works if pasted after your code. You can change the number of terms in the trigonometric series if you need.

enter image description here

import numpy as np
from scipy.optimize import curve_fit
x = np.concatenate((time1, time2, time3, time4))
y_points = np.concatenate((rate1, rate2, rate3, rate4))
den = x.max() - x.min()
def func(x, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, a14, a15):
    return  a1 *sin( 1*pi*x/den)+\
            a2 *sin( 2*pi*x/den)+\
            a3 *sin( 3*pi*x/den)+\
            a4 *sin( 4*pi*x/den)+\
            a5 *sin( 5*pi*x/den)+\
            a6 *sin( 6*pi*x/den)+\
            a7 *sin( 7*pi*x/den)+\
            a8 *sin( 8*pi*x/den)+\
            a9 *sin( 9*pi*x/den)+\
            a10*sin(10*pi*x/den)+\
            a11*sin(11*pi*x/den)+\
            a12*sin(12*pi*x/den)+\
            a13*sin(13*pi*x/den)+\
            a14*sin(14*pi*x/den)+\
            a15*sin(15*pi*x/den)
popt, pcov = curve_fit(func, x, y_points)
y = func(x, *popt)
plot(x,y, color='r', linewidth=2.)
show()

EDIT

As suggested by @Alfe, this fitting function could be written in a compact format like:

def func(x, a):
    return sum(a_i * sin(i * pi * x / den) for i, a_i in enumerate(a, 1))
Saullo G. P. Castro
  • 56,802
  • 26
  • 179
  • 234
  • 2
    @SaulloCastro I actually just rejected the edit, since it was inherently changing your code (we cannot see comments in the review queue so I didn't see you asking him to). The other reviewers rejected for the same reason I see, so in general keep in mind that 'substantial edits by someone else that change an answer radically' will be rejected, so you need to do it yourself :) – Niels Keurentjes May 12 '13 at 19:39
  • 1
    @Niels thank you by the information, I was not aware of this practice! – Saullo G. P. Castro May 12 '13 at 19:40
  • 1
    `def func(x, a):`, `return sum(a_i * sin(i * pi * x / den) for i, a_i in enumerate(a, 1))` would make the code much DRYer and allow passing an array of *a* coefficients of arbitrary size. – Alfe Jan 30 '18 at 16:48