5

I am trying to figure out what it is I don't understand here.

I am following http://www.scipy.org/Cookbook/FittingData and trying to fit a sine wave. The real problem is satellite magnetometer data which makes a nice sine wave on a spinning spacecraft. I created a dataset then am trying to fit it to recover the inputs.

Here is my code:

import numpy as np
from scipy import optimize

from scipy.optimize import curve_fit, leastsq

import matplotlib.pyplot as plt


class Parameter:
    def __init__(self, value):
            self.value = value

    def set(self, value):
            self.value = value

    def __call__(self):
            return self.value

def fit(function, parameters, y, x = None):
    def f(params):
        i = 0
        for p in parameters:
            p.set(params[i])
            i += 1
        return y - function(x)

    if x is None: x = np.arange(y.shape[0])
    p = [param() for param in parameters]
    return optimize.leastsq(f, p, full_output=True, ftol=1e-6, xtol=1e-6)

# generate a perfect data set (my real data have tiny error)
def mysine(x, a1, a2, a3):
    return a1 * np.sin(a2 * x + a3)

xReal = np.arange(500)/10.
a1 = 200.
a2 = 2*np.pi/10.5  # omega, 10.5 is the period
a3 = np.deg2rad(10.) # 10 degree phase offset
yReal = mysine(xReal, a1, a2, a3)

# plot the real data
plt.figure(figsize=(15,5))
plt.plot(xReal, yReal, 'r', label='Real Values')

# giving initial parameters
amplitude = Parameter(175.)
frequency = Parameter(2*np.pi/8.)
phase = Parameter(0.0)

# define your function:
def f(x): return amplitude() * np.sin(frequency() * x + phase())

# fit! (given that data is an array with the data to fit)
out = fit(f, [amplitude, frequency, phase], yReal, xReal)
period = 2*np.pi/frequency()
print amplitude(), period, np.rad2deg(phase())

xx = np.linspace(0, np.max(xReal), 50)
plt.plot( xx, f(xx) , label='fit')
plt.legend(shadow=True, fancybox=True)

Which makes this plot: enter image description here

The recovered fit parameters of [44.2434221897 8.094832581 -61.6204033699] have no resemblance to what I started with.

Any thoughts on what I am not understanding or doing wrong?

scipy.__version__
'0.10.1'

Edit: Fixing one parameter was suggested. In the example above fixing the amplitude to np.histogram(yReal)[1][-1] still produces unacceptable output. Fits: [175.0 8.31681375217 6.0] Should I try a different fitting method? Suggestions on which?

enter image description here

Brian Larsen
  • 1,740
  • 16
  • 28

2 Answers2

2

From what I can see from playing a bit with leastsq (without fancy stuff from the cookbook, just plain direct calls to leastsq --- and by the way, full_output=True is your friend here), is that it's very hard to fit all three of the amplitude, frequency and phase in one go. On the other hand, if I fix the amplitude and fit the frequency and phase, it works; if I fix the frequency and fit the amplitude and phase, it works too.

There is more than one way out here. What might be the simplest one --- if you are sure you only have one sine wave (and this is easy to check with the Fourier transform), then you know the frequency from just the distance between consecutive maxima of your signal. Then fit the two remaining parameters.

If what you have is a mixture of several harmonics, well, again, Fourier transform will tell you that.

ev-br
  • 24,968
  • 9
  • 65
  • 78
  • Thanks, I am sure I have just one sine wave but it changes amplitude a lot and frequency just a little in time. I will see about fixing the amplitude via just grabbing the high values from the data. – Brian Larsen Nov 15 '12 at 21:41
  • I would fix the frequency, not amplitude. Alternatively, convert everything into the Fourier space and do all the fitting in the frequency domain. – ev-br Nov 15 '12 at 21:55
  • Yeah, this issue here is that the spacecraft is changing frequency as it enters eclipse and I only care about the frequency and phase and not the amplitude so that seems more natural. – Brian Larsen Nov 15 '12 at 22:03
  • well, if the frequency is not constant in time, it's not a single sine wave, there's no way you can fit it with one. Hence, working in frequency domain looks way more promising. – ev-br Nov 15 '12 at 22:06
  • I will take a look at that. The period changes slowly therefore I figured I could fit in segments. I will look at the frequency domain methods, I am much less comfortable there so was trying to avoid that. – Brian Larsen Nov 15 '12 at 22:40
2

Here is some code implementing some of Zhenya's ideas. It uses

yhat = fftpack.rfft(yReal)
idx = (yhat**2).argmax()
freqs = fftpack.rfftfreq(N, d = (xReal[1]-xReal[0])/(2*pi))
frequency = freqs[idx]

to guess the main frequency of the data, and

amplitude = yReal.max()

to guess the amplitude.


import numpy as np
import scipy.optimize as optimize
import scipy.fftpack as fftpack
import matplotlib.pyplot as plt
pi = np.pi
plt.figure(figsize = (15, 5))

# generate a perfect data set (my real data have tiny error)
def mysine(x, a1, a2, a3):
    return a1 * np.sin(a2 * x + a3)

N = 5000
xmax = 10
xReal = np.linspace(0, xmax, N)
a1 = 200.
a2 = 2*pi/10.5  # omega, 10.5 is the period
a3 = np.deg2rad(10.) # 10 degree phase offset
print(a1, a2, a3)
yReal = mysine(xReal, a1, a2, a3) + 0.2*np.random.normal(size=len(xReal))

yhat = fftpack.rfft(yReal)
idx = (yhat**2).argmax()
freqs = fftpack.rfftfreq(N, d = (xReal[1]-xReal[0])/(2*pi))
frequency = freqs[idx]

amplitude = yReal.max()
guess = [amplitude, frequency, 0.]
print(guess)
(amplitude, frequency, phase), pcov = optimize.curve_fit(
    mysine, xReal, yReal, guess)

period = 2*pi/frequency
print(amplitude, frequency, phase)

xx = xReal
yy = mysine(xx, amplitude, frequency, phase)
# plot the real data
plt.plot(xReal, yReal, 'r', label = 'Real Values')
plt.plot(xx, yy , label = 'fit')
plt.legend(shadow = True, fancybox = True)
plt.show()

yields

(200.0, 0.5983986006837702, 0.17453292519943295)   # (a1, a2, a3)
[199.61981404516041, 0.61575216010359946, 0.0]     # guess
(200.06145097308041, 0.59841420869261097, 0.17487141943703263) # fitted parameters

Notice that by using fft, the guess for the frequency is already pretty close to final fitted parameter.

It seems you do not need to fix any of the parameters. By making the frequency guess closer to the actual value, optimize.curve_fit is able to converge to a reasonable answer.

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677