fitting data to fourier3 series always produce a straight line

Question

I have data where I want to fit the Fourier3 series, I looked to this answer: here and tried different algorithms from different packages (like symfit, and scipy). But when I plot the data, different packages give me get this result: enter image description here

Currently, I'm using the curve_fit package from scipy and here is my code:

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import pandas as pd

def fourier(x, *as_bs):
    sum_a = 0
    sum_b = 0
    j = 1
    w = as_bs[0]
    a0 = as_bs[1]
    for i in range(2, len(as_bs)-1, 2):
        sum_a += as_bs[i] * np.cos(j * w * x)
        sum_b += as_bs[i+1] * np.sin(j * w * x)
        j = j + 1

    return a0 + sum_a + sum_b

T = pd.read_excel('FS_data.xlsx')
A = pd.DataFrame(T)

xdata = np.array(A.iloc[:, 0])
ydata = np.array(A.iloc[:, 1])

# fits
popt, pcov = curve_fit(fourier, xdata, ydata, [np.random.rand(1)] * 8)
print(popt)

data_fit = fourier(ydata, *popt)
print(data_fit)

plt.plot(ydata)
plt.plot(data_fit, label='after fitting')
plt.legend()
plt.show()

So, my code basically will read random 8 numbers and assign them as initial guesses for (f, a0, a1, b1, a2, b2, a3, b3) respectively.

I tried to fit the data on Matlab to check if the data can be fitted with the fourier3 and the results there are great: enter image description here

I printed the output on both Python and Matlab to compare and here is the results for both: Python:

w = 5.66709943e-01
a0 = 3.80499132e+01 
a1 = 5.56883486e-04
b1 = -3.88408379e-04
a2 = -3.88408379e-04
b2 = 3.32951592e-04
a3 = 3.15641900e-04
b3 = 1.96414168e-04

Matlab:

   a0 =       38.07  (38.07, 38.08)
   a1 =      0.5352  (0.4951, 0.5753)
   b1 =     -0.5788  (-0.5863, -0.5714)
   a2 =     -0.3728  (-0.413, -0.3326)
   b2 =      0.5411  (0.492, 0.5901)
   a3 =      0.2357  (0.2226, 0.2488)
   b3 =     0.05895  (0.02773, 0.09018)
   w =   0.0003088

So as noted, only the value for a0 was correct, but the others are very far from Matlab. So why I'm getting this result in Python? What I'm doing wrong?

Here is the data for those who like to test it out:

https://docs.google.com/spreadsheets/d/18lL1iMZ3kdaqUUtRDLNRK4A3uCPzOrXt/edit?usp=sharing&ouid=112684448221465330517&rtpof=true&sd=true

Your data is in the range of 17500 and you are giving a starting value for omega in the range of 1. Just adapt this and it might already be enough. — mikuszefski, Dec 13 '21 at 06:51
I am quite sure that the line `data_fit = fourier(ydata, *popt)` should read `data_fit = fourier(xdata, *popt)`...then with starting values `[1/10000,38,1, 1, 1, 1, 1, 1]` works fine for me — mikuszefski, Dec 13 '21 at 14:07
@mikuszefski Yes, you are correct! So basically the function is affected by the initial guesses provided? I thought that won't affect it since in Matlab we don't pass an initial guess. Could you please just elaborate more on why you selected these values? Is it always we need to give initial guesses within the same range of the data? Please provide the answer in answer format so I can specify it as the correct answer to the question. Thank you for your help — abdullatif alnajim, Dec 13 '21 at 15:03

score 1 · Accepted Answer · answered Dec 14 '21 at 07:38

I am not into Matlab, so I don't know, which additional work the Matlab fit does to estimate starting values for a non-linear fit. I can say, though, that curve_fit does non at all, i.e. all values are assumed to be on the order of 1. The easiest way, would have been to rescale the x axis to the range [0, 2 pi]. Hence, the problem of the OP is, once again, wrong starting values. Rescaling requires, however, the knowledge that the main wave to be fitted is approximately the width of the data set. Moreover, we need to assume that all other fit parameters are also of the order 1. Luckily, this is the case, so this would have worked:

import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit

xdat, ydat = np.loadtxt( "data.tsv", unpack=True, skiprows=1 )


def fourier(x, *as_bs):
    sum_a = 0
    sum_b = 0
    j = 1
    w = as_bs[0]
    a0 = as_bs[1]
    for i in range(2, len( as_bs ) - 1, 2 ):
        sum_a += as_bs[i] * np.cos( j * w * x )
        sum_b += as_bs[i+1] * np.sin( j * w * x )
        j = j + 1
    return a0 + sum_a + sum_b

"""
lets rescale the data to get the base frequency in the range of one
"""

xmin = min( xdat )
xmax = max( xdat )
xdat = ( xdat - xmin ) / (xmax - xmin ) * 2 * np.pi

popt, pcov = curve_fit(
    fourier,
    xdat, ydat,
    p0 = np.ones(8)
)
### here I assume that higher order are similar to lower orders
### but slightly smaller. ... hoping that the fit correts errors in
### this assumption

print(popt)
### scale back w noting that it scales inverse to x
print( popt[0] * 2 * np.pi  / (xmax - xmin ) )
data_fit = fourier( xdat, *popt )

If we cannot make the assumptions above, we may only assume that there is a base frequency with a dominant contribution to the signal (Note that this is not always true). In this case we can pre-calculate starting guesses in an non-iterative way.

The solution looks a bit more complicated:


import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit
from scipy.integrate import cumtrapz

xdat, ydat = np.loadtxt( "data.tsv", unpack=True, skiprows=1 )


def fourier(x, *as_bs):
    sum_a = 0
    sum_b = 0
    j = 1
    w = as_bs[0]
    a0 = as_bs[1]
    for i in range(2, len( as_bs ) - 1, 2 ):
        sum_a += as_bs[i] * np.cos( j * w * x )
        sum_b += as_bs[i+1] * np.sin( j * w * x )
        j = j + 1
    return a0 + sum_a + sum_b

#### initial guess
"""
This uses the fact that if y = a sin w t + b cos w t + c we have
int int y = -y/w^2 + c/2 t^2 + d t + e
i.e. we can get 1/w^2 as linear fit parameter without the danger of
a non-linear fit iterative process running into a local minimum
for details see:
https://scikit-guess.readthedocs.io/en/sine/_downloads/4b4ed1e691ff195be3ca73879a674234/Regressions-et-equations-integrales.pdf
"""

Sy = cumtrapz( ydat, xdat, initial=0 )
SSy = cumtrapz( Sy, xdat, initial=0 )

ST = np.array( [
    ydat, xdat**2, xdat, np.ones( len( xdat ) )
] )
S = np.transpose( ST )
eta = np.dot( ST, SSy )
A = np.dot( ST, S )
sol = np.linalg.solve( A, eta )

wFit = np.sqrt( -1 / sol[0] )
### linear parameters
"""
Once we have a good guess for w we can get starting guesses for
a, b and c from a standard linear fit
"""
ST = np.array( [
    np.sin( wFit * xdat ), np.cos( wFit * xdat ), np.ones( len( xdat ) )
])
S = np.transpose( ST )
eta = np.dot( ST, ydat )
A = np.dot( ST, S )
sol = np.linalg.solve( A, eta )

a1 = sol[0]
b1 = sol[1]
a0 = sol[2]

### final non-linear fit
"""
Now we can use the guesses from above as input for the final
non-linear fit. Hopefully, we are now close enough to the global minimum
and have the algorithm converge reasonably
"""
popt, pcov = curve_fit(
    fourier,
    xdat, ydat,
        p0=[
            wFit, a0, a1, b1,
            a1 / 2, b1 / 2,
            a1 / 4, b1 / 4
        ]
)
### here I assume that higher order are similar to lower orders
### but slightly smaller. ... hoping that the fit correts errors in
### this assumption

print(popt)
data_fit = fourier( xdat, *popt )

plt.plot( xdat, ydat, ls="", marker="o", ms=0.5, label="data" )
plt.plot( xdat, data_fit, label='fitting')
plt.legend()
plt.show()

Both providing basically the same solution, with the latter code being applicable to more cases with less assumptions.

fitting data to fourier3 series always produce a straight line

1 Answers1