0

Here is my data in excel I want to fit this data in a sine curve

here is my code,

#Fitting function
def func(x, offset, A, freq, phi):
    return offset + A * np.sin(freq * x + phi)

#Experimental x and y data points   
# test_df is the input excel df 
x_data = test_df['x_data']
y_data = test_df['y_data']


#Plot input data points
plt.plot(x_data, y_data, 'bo', label='experimental-data')

# Initial guess for the parameters
initial_guess = [.38, 2.3, .76, 2.77]    

#Perform the curve-fit
popt, pcov = curve_fit(func, x_data, y_data, initial_guess)
print(popt)

#x values for the fitted function
x_fit = np.arange(0.0, 31, 0.01)

#Plot the fitted function
plt.plot(x_fit, func(x_fit, *popt), 'r')

plt.show()

This is the graph. enter image description here I think this is not the best fit. I would like to have suggestion to improve the curve fit.

1 Answers1

0

Well, it does not seem to be a mathematical function, as for example for argument value 15 you may have multiple values (f(x) equals what?). Thus, it won't be a classical interpolation in this case. If you could normalize the data somehow, ie make a function out of it, then you could use numpy.

Simplest approach would be to add some small disturbance where arguments' values are equal. Let's look at an example in your data:

4   0.0326
4   0.014
4   -0.0086
4   0.0067

So, as you can see, you can't tell what's the relation's value for f(4). If you'd disturb the arguments a bit, eg:

3.9     -0.0086
3.95    0.0067
4       0.014
4.05    0.0326

And so on for all such examples from your data file. Simplest approach would be to group these values by their x argument, sort and disturb.

That would obviously introduce some error, but, well...you are curve fitting anyway, right?

To formulate a sine, you have to know the amplitude, frequency and phase: f(x) = A * sin(F*x + p) where A is the amplitude, F is the frequency and p is the phase. Numpy has dedicated methods for this if you've got a proper data set prepared: How do I fit a sine curve to my data with pylab and numpy?

Marek Piotrowski
  • 2,988
  • 3
  • 11
  • 16
  • ....multiple values for one input....that's why you make a fit and not an interpolation.... – mikuszefski Aug 24 '20 at 06:50
  • Interesting perspective, appreciate your time. The x_data is date and and 15th Jan/Feb and so on have their own value. The SO answer you suggested, I already used it. Its good but not so good. Well...I am curve fitting anyway, right? Cheers! – Rasel Rahman Aug 24 '20 at 16:04
  • In mathematical sense, I guess you and @mikuszefski are right, I just saw numpy's API and in all examples they assumed y to be a function of x, not just any relation, thus suggested easiest solution possible. Additionally, if these are dates then we're talking about a discrete domain - why would you need a curve in such case? By the way, if the answer was helpful but does not directly provide solution I guess you could upvote it but not accept. Not sure though. – Marek Piotrowski Aug 24 '20 at 16:20
  • I am sorry, but despite the fact that your answer got excepted, I'd like to point out that your suggestions do not make sense. In a real live system ( statistic questioning, n times 1000 persons or measuring a physical value 1000 times with identical conditions) it is absolutely normal that one input parameter gives statistically distributed output parameter. That is why interpolation does not make sense and introducing an artificial randomness on your input to avoid multiple y for the same x even less.... – mikuszefski Aug 25 '20 at 05:27
  • ...The fit process, most of the time least square approach, exists to take care of exactly that problem, by minimizing the quadratic distance to a theoretical "true" value. Also, having discrete measurement points, does not mean that the underlying phenomenon is not continuous. – mikuszefski Aug 25 '20 at 05:27
  • This is why I suggested the OP to upvote it (if it was helpful) but not mark as accepted. As I said, I really wouldn't like to argue - I saw numpy's API and in all examples the input data assumed single value for each x, so I thought we could adjust the data set a bit, considering the fitting/interpolation will introduce an error anyway. I believe you could always post your answer, right? – Marek Piotrowski Aug 25 '20 at 06:05
  • And in context of discreteness - sure but these are dates. If we've got measurements each day, what sort of information could we possibly get from fitting them into a curve? What happens in hours in-between? Or what happens in the future? I believe there are easier methods for such predictive modelling in discrete domain, right? – Marek Piotrowski Aug 25 '20 at 06:09