50

I am trying to show that economies follow a relatively sinusoidal growth pattern. I am building a python simulation to show that even when we let some degree of randomness take hold, we can still produce something relatively sinusoidal.

I am happy with the data I'm producing, but now I'd like to find some way to get a sine graph that pretty closely matches the data. I know you can do polynomial fit, but can you do sine fit?

desertnaut
  • 57,590
  • 26
  • 140
  • 166
ChapmIndustries
  • 1,401
  • 4
  • 17
  • 19

6 Answers6

100

Here is a parameter-free fitting function fit_sin() that does not require manual guess of frequency:

import numpy, scipy.optimize

def fit_sin(tt, yy):
    '''Fit sin to the input time sequence, and return fitting parameters "amp", "omega", "phase", "offset", "freq", "period" and "fitfunc"'''
    tt = numpy.array(tt)
    yy = numpy.array(yy)
    ff = numpy.fft.fftfreq(len(tt), (tt[1]-tt[0]))   # assume uniform spacing
    Fyy = abs(numpy.fft.fft(yy))
    guess_freq = abs(ff[numpy.argmax(Fyy[1:])+1])   # excluding the zero frequency "peak", which is related to offset
    guess_amp = numpy.std(yy) * 2.**0.5
    guess_offset = numpy.mean(yy)
    guess = numpy.array([guess_amp, 2.*numpy.pi*guess_freq, 0., guess_offset])

    def sinfunc(t, A, w, p, c):  return A * numpy.sin(w*t + p) + c
    popt, pcov = scipy.optimize.curve_fit(sinfunc, tt, yy, p0=guess)
    A, w, p, c = popt
    f = w/(2.*numpy.pi)
    fitfunc = lambda t: A * numpy.sin(w*t + p) + c
    return {"amp": A, "omega": w, "phase": p, "offset": c, "freq": f, "period": 1./f, "fitfunc": fitfunc, "maxcov": numpy.max(pcov), "rawres": (guess,popt,pcov)}

The initial frequency guess is given by the peak frequency in the frequency domain using FFT. The fitting result is almost perfect assuming there is only one dominant frequency (other than the zero frequency peak).

import pylab as plt

N, amp, omega, phase, offset, noise = 500, 1., 2., .5, 4., 3
#N, amp, omega, phase, offset, noise = 50, 1., .4, .5, 4., .2
#N, amp, omega, phase, offset, noise = 200, 1., 20, .5, 4., 1
tt = numpy.linspace(0, 10, N)
tt2 = numpy.linspace(0, 10, 10*N)
yy = amp*numpy.sin(omega*tt + phase) + offset
yynoise = yy + noise*(numpy.random.random(len(tt))-0.5)

res = fit_sin(tt, yynoise)
print( "Amplitude=%(amp)s, Angular freq.=%(omega)s, phase=%(phase)s, offset=%(offset)s, Max. Cov.=%(maxcov)s" % res )

plt.plot(tt, yy, "-k", label="y", linewidth=2)
plt.plot(tt, yynoise, "ok", label="y with noise")
plt.plot(tt2, res["fitfunc"](tt2), "r-", label="y fit curve", linewidth=2)
plt.legend(loc="best")
plt.show()

The result is good even with high noise:

Amplitude=1.00660540618, Angular freq.=2.03370472482, phase=0.360276844224, offset=3.95747467506, Max. Cov.=0.0122923578658

With noise Low frequency High frequency

unsym
  • 2,080
  • 1
  • 20
  • 19
  • 2
    Dear unsym I tried to run your code but unfortunately I receive the following message: TypeError: 'numpy.float64' object cannot be interpreted as an integer When I try to plot the function. Do you have any idea to solve this? – vicemagui Sep 27 '18 at 21:08
  • 1
    I have just run the code verbatim as is and it worked without any error, using python 3.6 (in a jupyter notebook with matplotlib "inline" plotting turned on in the notebook) – John Collins Oct 06 '21 at 10:09
  • what is the technique called for estimating amplitude used in this code? – Rima May 18 '23 at 16:05
59

You can use the least-square optimization function in scipy to fit any arbitrary function to another. In case of fitting a sin function, the 3 parameters to fit are the offset ('a'), amplitude ('b') and the phase ('c').

As long as you provide a reasonable first guess of the parameters, the optimization should converge well.Fortunately for a sine function, first estimates of 2 of these are easy: the offset can be estimated by taking the mean of the data and the amplitude via the RMS (3*standard deviation/sqrt(2)).

Note: as a later edit, frequency fitting has also been added. This does not work very well (can lead to extremely poor fits). Thus, use at your discretion, my advise would be to not use frequency fitting unless frequency error is smaller than a few percent.

This leads to the following code:

import numpy as np
from scipy.optimize import leastsq
import pylab as plt

N = 1000 # number of data points
t = np.linspace(0, 4*np.pi, N)
f = 1.15247 # Optional!! Advised not to use
data = 3.0*np.sin(f*t+0.001) + 0.5 + np.random.randn(N) # create artificial data with noise

guess_mean = np.mean(data)
guess_std = 3*np.std(data)/(2**0.5)/(2**0.5)
guess_phase = 0
guess_freq = 1
guess_amp = 1

# we'll use this to plot our first estimate. This might already be good enough for you
data_first_guess = guess_std*np.sin(t+guess_phase) + guess_mean

# Define the function to optimize, in this case, we want to minimize the difference
# between the actual data and our "guessed" parameters
optimize_func = lambda x: x[0]*np.sin(x[1]*t+x[2]) + x[3] - data
est_amp, est_freq, est_phase, est_mean = leastsq(optimize_func, [guess_amp, guess_freq, guess_phase, guess_mean])[0]

# recreate the fitted curve using the optimized parameters
data_fit = est_amp*np.sin(est_freq*t+est_phase) + est_mean

# recreate the fitted curve using the optimized parameters

fine_t = np.arange(0,max(t),0.1)
data_fit=est_amp*np.sin(est_freq*fine_t+est_phase)+est_mean

plt.plot(t, data, '.')
plt.plot(t, data_first_guess, label='first guess')
plt.plot(fine_t, data_fit, label='after fitting')
plt.legend()
plt.show()

enter image description here

Edit: I assumed that you know the number of periods in the sine-wave. If you don't, it's somewhat trickier to fit. You can try and guess the number of periods by manual plotting and try and optimize it as your 6th parameter.

Dhara
  • 6,587
  • 2
  • 31
  • 46
  • 6
    This solution, though accepted by OP, seems to skip over the trickiest part: the _frequency_ `f` as in `y = Amplitude*sin(frequency*x +Phase) + Offset`. How well does this method work if `f` is unknown? – chux - Reinstate Monica Nov 02 '13 at 18:28
  • Is there a reason not to use scipy's curve_fit function? I'm guessing there's something about that initial guess and/or curve_fit making assumptions about the function. – cursa Jun 16 '14 at 12:44
  • 3
    I think the order of initial parameter values you provide the function is wrong. What about `est_a, est_b, est_c = leastsq(optimize_func, [guess_b, guess_a, guess_c])[0]`? For clarity, I would suggest replacing _a with _offset, _b with _amp, and _c with _phase everywhere and use increasing order of x[i] in your lambda. – chadwick.boulay Jul 25 '14 at 17:28
18

More userfriendly to us is the function curvefit. Here an example:

import numpy as np
from scipy.optimize import curve_fit
import pylab as plt

N = 1000 # number of data points
t = np.linspace(0, 4*np.pi, N)
data = 3.0*np.sin(t+0.001) + 0.5 + np.random.randn(N) # create artificial data with noise

guess_freq = 1
guess_amplitude = 3*np.std(data)/(2**0.5)
guess_phase = 0
guess_offset = np.mean(data)

p0=[guess_freq, guess_amplitude,
    guess_phase, guess_offset]

# create the function we want to fit
def my_sin(x, freq, amplitude, phase, offset):
    return np.sin(x * freq + phase) * amplitude + offset

# now do the fit
fit = curve_fit(my_sin, t, data, p0=p0)

# we'll use this to plot our first estimate. This might already be good enough for you
data_first_guess = my_sin(t, *p0)

# recreate the fitted curve using the optimized parameters
data_fit = my_sin(t, *fit[0])

plt.plot(data, '.')
plt.plot(data_fit, label='after fitting')
plt.plot(data_first_guess, label='first guess')
plt.legend()
plt.show()
Vasco
  • 1,177
  • 11
  • 28
  • @IceArdor: Can you add a working code example of the solvers you propose? – Vasco Oct 23 '17 at 12:12
  • 1
    @Vasco what if I have a x_train of m*n shape and y_train of m*1 shape then in that case i am receiving this error: ValueError: operands could not be broadcast together with shapes (38563,54) (38563,) . So what should i do now? – sak Jul 13 '19 at 08:24
  • @Vasco Can you show me some example or if can post your answer here with some example code it would be very helpful: https://stackoverflow.com/questions/57027040/regressing-periodic-data-with-sklearn – sak Jul 17 '19 at 03:47
10

The current methods to fit a sin curve to a given data set require a first guess of the parameters, followed by an interative process. This is a non-linear regression problem.

A different method consists in transforming the non-linear regression to a linear regression thanks to a convenient integral equation. Then, there is no need for initial guess and no need for iterative process : the fitting is directly obtained.

In case of the function y = a + r*sin(w*x+phi) or y=a+b*sin(w*x)+c*cos(w*x), see pages 35-36 of the paper "Régression sinusoidale" published on Scribd

In case of the function y = a + p*x + r*sin(w*x+phi) : pages 49-51 of the chapter "Mixed linear and sinusoidal regressions".

In case of more complicated functions, the general process is explained in the chapter "Generalized sinusoidal regression" pages 54-61, followed by a numerical example y = r*sin(w*x+phi)+(b/x)+c*ln(x), pages 62-63

Thirumalai murugan
  • 5,698
  • 8
  • 32
  • 54
JJacquelin
  • 101
  • 1
  • 2
  • By chance, do you happen to have a reference for damped sinusoids' transformation? – Sapiens Sep 20 '20 at 22:39
  • +1 this looks very interesting, the problem is that the reference is in French. Does anyone know some English resource (could be anything from blog post / paper / code) that illustrates the approach in a bit more detail? If converting to linear regression is indeed possible, I'm wondering why this approach is not more popular? – bluenote10 Sep 22 '20 at 09:42
  • 1
    Ah I've found a great resource: [scikit-guess](https://github.com/madphysicist/scikit-guess). Not only does it provide an implementation, it also comes with an entire [translation](https://scikit-guess.readthedocs.io/en/latest/appendices/references.html) of the paper. – bluenote10 Sep 23 '20 at 09:43
  • @nopeva . I will try to answer to your question. But it is difficult to avoid confusion between the various cases and examples. In order to avoid misunderstanding I think it should be better that you open a new question specific of your actual problem : With an example of data and with the function to be fitted. So I could answer with the details of calculus. – JJacquelin Jul 24 '23 at 10:06
  • @nopeva . The full answer to your question (with the data) about the fitting of the sum of two sinusoidal functions is ready to be posted. But I cannot find where your question is now. Is your question closed or deleted ? – JJacquelin Jul 26 '23 at 13:38
  • @nopeva. Good new : With your data the quality of fitting using the method with integral equation is almost the same as using usual non-linear regression. But of course without requiring initial guessed values which is the advantage compared to iterative non-linear regression. – JJacquelin Jul 27 '23 at 06:35
  • @JJacquelin thanks for taking the time to find a solution. I have reopened the question. – nopeva Jul 27 '23 at 21:40
5

All the above answers are based on curve fitting, and most use an iterative method - they all work very nicely, but I wanted to add a different approach using an FFT. Here, we transform the data, set all but the peak frequency to zero and then do the inverse transform. Note, that you probably want to remove the data mean (and detrend) before doing the FFT and then you can add those back in after.

import numpy as np
import pylab as plt

# fake data

N = 1000 # number of data points
t = np.linspace(0, 4*np.pi, N)
f = 1.05
data = 3.0*np.sin(f*t+0.001) + np.random.randn(N) # create artificial data with noise

# FFT...
mfft=np.fft.fft(data)
imax=np.argmax(np.absolute(mfft))
mask=np.zeros_like(mfft)
mask[[imax]]=1
mfft*=mask
fdata=np.fft.ifft(mfft)


plt.plot(t, data, '.')
plt.plot(t, fdata,'.', label='FFT')
plt.legend()
plt.show()

enter image description here

ClimateUnboxed
  • 7,106
  • 3
  • 41
  • 86
1

If you already know the frequency, you can do a linear fit, which is computationally more efficient than the nonlinear fitting methods. As @JJacquelin points out, no initial guesses are required.

Explicitly, you'd fit y=a+b*sin(x)+c*sin(x) to the data. Note, this is equivalent to A*sin(x+phi), by trig identities. However, this is expressed in a way that's linear in the fit parameters (though not in x,y). Thus we can use a linear fit in python.

Just pretend that x1 = sin(x) and x2 = cos(x) are inputs, use a linear fitting function on y = a + b* x1 + c* x2

To do that use

from sklearn.linear_model import LinearRegression
reg = LinearRegression()

x= # your x values list
y = # your y values list
X = np.column_stack((np.sin(x), np.cos(x)))

reg.fit(X, y)

you can access the fit parameters by:

a = reg.intercept_
b = reg.coef_[0]
c = reg.coef_[1]
ions me
  • 111
  • 5