3

I am trying to fit some data using scipy.optimize.curve_fit. I have read the documentation and also this StackOverflow post, but neither seem to answer my question.

I have some data which is simple, 2D data which looks approximately like a trig function. I want to fit it with a general trig function using scipy.

My approach is as follows:

from __future__ import division
import numpy as np
from scipy.optimize import curve_fit



#Load the data
data = np.loadtxt('example_data.txt')
t = data[:,0]
y = data[:,1]


#define the function to fit
def func_cos(t,A,omega,dphi,C):
    # A is the amplitude, omega the frequency, dphi and C the horizontal/vertical shifts
    return A*np.cos(omega*t + dphi) + C

#do a scipy fit
popt, pcov = curve_fit(func_cos, t,y)

#Plot fit data and original data
fig = plt.figure(figsize=(14,10))
ax1 = plt.subplot2grid((1,1), (0,0))

ax1.plot(t,y)
ax1.plot(t,func_cos(t,*popt))

This outputs:

enter image description here

where blue is the data orange is the fit. Clearly I am doing something wrong. Any pointers?

user1887919
  • 829
  • 2
  • 9
  • 24

1 Answers1

5

If no values are provided for initial guess of the parameters p0 then a value of 1 is assumed for each of them. From the docs:

p0 : array_like, optional
Initial guess for the parameters (length N). If None, then the initial values will all be 1 (if the number of parameters for the function can be determined using introspection, otherwise a ValueError is raised).

Since your data has very large x-values and very small y-values an initial guess of 1 is far from the actual solution and hence the optimizer does not converge. You can help the optimizer by providing suitable initial parameter values that can be guessed / approximated from the data:

  • Amplitude: A = (y.max() - y.min()) / 2
  • Offset: C = (y.max() + y.min()) / 2
  • Frequency: Here we can estimate the number of zero crossing by multiplying consecutive y-values and check which products are smaller than zero. This number divided by the total x-range gives the frequency and in order to get it in units of pi we can multiply that number by pi: y_shifted = y - offset; oemga = np.pi * np.sum(y_shifted[:-1] * y_shifted[1:] < 0) / (t.max() - t.min())
  • Phase shift: can be set to zero, dphi = 0

So in summary, the following initial parameter guess can be used:

offset = (y.max() + y.min()) / 2
y_shifted = y - offset
p0 = (
    (y.max() - y.min()) / 2,
    np.pi * np.sum(y_shifted[:-1] * y_shifted[1:] < 0) / (t.max() - t.min()),
    0,
    offset
)
popt, pcov = curve_fit(func_cos, t, y, p0=p0)

Which gives me the following fit function:

Fit

a_guest
  • 34,165
  • 12
  • 64
  • 118
  • That is perfect, thanks! Is it just me or does the fit seem slightly vertically displaced though? – user1887919 Dec 11 '19 at 20:25
  • 2
    @user1887919 It looks like but this is the version that minimizes the difference between the two functions. Around the peaks it seems the fit function is shifted but this has the benefit that the slopes are in very good agreement. Considering that the peaks make only a small portion of the x-range while the slopes make a great portion, this way the difference is minimized. You can try and set the initial guess for the offset to something like `amplitude + offset` but it will again converge to this solution. – a_guest Dec 11 '19 at 20:29