I have a space-separated csv
file containing a measurement. First column is the time of measurement, second column is the corresponding measured value, third column is the error. The file can be found here. I would like to fit the parameters a_i, f, phi_n
of the function g to the data, using Python:
Reading the data in:
import numpy as np
data=np.genfromtxt('signal.data')
time=data[:,0]
signal=data[:,1]
signalerror=data[:,2]
Plot the data:
import matplotlib.pyplot as plt
plt.figure()
plt.plot(time,signal)
plt.scatter(time,signal,s=5)
plt.show()
Getting the result:
Now lets calculate a preliminary frequency guess of the periodic signal:
from gatspy.periodic import LombScargleFast
dmag=0.000005
nyquist_factor=40
model = LombScargleFast().fit(time, signal, dmag)
periods, power = model.periodogram_auto(nyquist_factor)
model.optimizer.period_range=(0.2, 10)
period = model.best_period
We get the result: 0.5467448186001437
I define the function to fit as follows, for N=10
:
def G(x, A_0,
A_1, phi_1,
A_2, phi_2,
A_3, phi_3,
A_4, phi_4,
A_5, phi_5,
A_6, phi_6,
A_7, phi_7,
A_8, phi_8,
A_9, phi_9,
A_10, phi_10,
freq):
return (A_0 + A_1 * np.sin(2 * np.pi * 1 * freq * x + phi_1) +
A_2 * np.sin(2 * np.pi * 2 * freq * x + phi_2) +
A_3 * np.sin(2 * np.pi * 3 * freq * x + phi_3) +
A_4 * np.sin(2 * np.pi * 4 * freq * x + phi_4) +
A_5 * np.sin(2 * np.pi * 5 * freq * x + phi_5) +
A_6 * np.sin(2 * np.pi * 6 * freq * x + phi_6) +
A_7 * np.sin(2 * np.pi * 7 * freq * x + phi_7) +
A_8 * np.sin(2 * np.pi * 8 * freq * x + phi_8) +
A_9 * np.sin(2 * np.pi * 9 * freq * x + phi_9) +
A_10 * np.sin(2 * np.pi * 10 * freq * x + phi_10))
Now we need a function which fits G
:
def fitter(time, signal, signalerror, LSPfreq):
from scipy import optimize
pfit, pcov = optimize.curve_fit(lambda x, _A_0,
_A_1, _phi_1,
_A_2, _phi_2,
_A_3, _phi_3,
_A_4, _phi_4,
_A_5, _phi_5,
_A_6, _phi_6,
_A_7, _phi_7,
_A_8, _phi_8,
_A_9, _phi_9,
_A_10, _phi_10,
_freqfit:
G(x, _A_0, _A_1, _phi_1,
_A_2, _phi_2,
_A_3, _phi_3,
_A_4, _phi_4,
_A_5, _phi_5,
_A_6, _phi_6,
_A_7, _phi_7,
_A_8, _phi_8,
_A_9, _phi_9,
_A_10, _phi_10,
_freqfit),
time, signal, p0=[11, 2, 0, #p0 is the initial guess for numerical fitting
1, 0,
0, 0,
0, 0,
0, 0,
0, 0,
0, 0,
0, 0,
0, 0,
0, 0,
LSPfreq],
sigma=signalerror, absolute_sigma=True)
error = [] # DEFINE LIST TO CALC ERROR
for i in range(len(pfit)):
try:
error.append(np.absolute(pcov[i][i]) ** 0.5) # CALCULATE SQUARE ROOT OF TRACE OF COVARIANCE MATRIX
except:
error.append(0.00)
perr_curvefit = np.array(error)
return pfit, perr_curvefit
Check what we got:
LSPfreq=1/period
pfit, perr_curvefit = fitter(time, signal, signalerror, LSPfreq)
plt.figure()
model=G(time,pfit[0],pfit[1],pfit[2],pfit[3],pfit[4],pfit[5],pfit[6],pfit[7],pfit[8],pfit[8],pfit[10],pfit[11],pfit[12],pfit[13],pfit[14],pfit[15],pfit[16],pfit[17],pfit[18],pfit[19],pfit[20],pfit[21])
plt.scatter(time,model,marker='+')
plt.plot(time,signal,c='r')
plt.show()
Yielding:
Which is clearly wrong. If I play with the initial guesses p0
in the definition of function fitter
, I can get a slightly better result. Setting
p0=[11, 1, 0,
0.1, 0,
0, 0,
0, 0,
0, 0,
0, 0,
0, 0,
0, 0,
0, 0,
0, 0,
LSPfreq]
Gives us (zoomed in):
Which is a bit better. High frequency components are still in, despite the amplitude of the high frequency compontents were guessed to be zero. The original p0
seems also more justified than the modified version based on visual inspection of the data.
I played around with different values for p0
, and while changing p0
indeed changes the result, I do not get a line reasonably well fitted to the data.
Why does this model fitting method fail? How can I improve get a good fit?
The whole code can be found here.
The original version of this question was posted here.
EDIT:
PyCharm gives a warning for the p0
part of the code:
Expected type 'Union[None,int,float,complex]', got 'List[Union[int,float],Any]]' instead
which with I don't know how to deal with, but might be relevant.