6

I'm trying to determine the best model function and parameters to use for real world data. I have several data sets that all exhibit a similar exponential decay and I want to calculate the fitted function parameters for each data set.

The largest data set varies from 1 to about 1,000,000 in the x-axis and from 0 to about 10,000 in the y-axis.

I'm new to Numpy and Scipy so I've tried adapting the code from this question to my data but without success: fitting exponential decay with no initial guessing

# -*- coding: utf-8 -*-
import numpy as np
import matplotlib.pyplot as plt
import scipy as sp
import scipy.optimize

x = np.array([   1.,    4.,    9.,   16.,   25.,   36.,   49.,   64.,   81.,  100.,  121.,
        144.,  169.,  196.,  225.,  256.,  289.,  324.,  361.,  400.,  441.,  484.,
        529.,  576.,  625.,  676.,  729.,  784.,  841.,  900.,  961., 1024., 1089.,
       1156., 1225., 1296., 1369., 1444., 1521., 1600., 1681., 1764., 1849., 1936.,
       2025., 2116., 2209., 2304., 2401., 2500., 2601., 2704., 2809., 2916., 3025.,
       3136., 3249., 3364., 3481., 3600., 3721., 3844., 3969., 4096., 4225., 4356.,
       4489., 4624., 4761., 4943.])

y = np.array([3630., 2590., 2063., 1726., 1484., 1301., 1155., 1036.,  936.,  851.,  778.,
        714.,  657.,  607.,  562.,  521.,  485.,  451.,  421.,  390.,  362.,  336.,
        312.,  293.,  279.,  265.,  253.,  241.,  230.,  219.,  209.,  195.,  183.,
        171.,  160.,  150.,  142.,  134.,  127.,  120.,  114.,  108.,  102.,   97.,
         91.,   87.,   83.,   80.,   76.,   73.,   70.,   67.,   64.,   61.,   59.,
         56.,   54.,   51.,   49.,   47.,   45.,   43.,   41.,   40.,   38.,   36.,
         35.,   33.,   31.,   30.])

# Define the model function
def model_func(x, A, K, C):
    return A * np.exp(-K * x) + C

# Optimise the curve
opt_parms, parm_cov = sp.optimize.curve_fit(model_func, x, y, maxfev=1000)

# Fit the parameters to the data
A, K, C = opt_parms
fit_y = model_func(x, A, K, C)

# Visualise the original data and the fitted function
plt.clf()
plt.title('Decay Data')
plt.plot(x, y, 'r.', label='Actual Data\n')
plt.plot(x, fit_y, 'b-', label='Fitted Function:\n $y = %0.2f e^{%0.2f t} + %0.2f$' % (A, K, C))
plt.legend(bbox_to_anchor=(1, 1), fancybox=True, shadow=True)
plt.show()

decay-data

When I run this code using Python 2.7 (on Windows 7 64-bit) I get the error message RuntimeWarning: overflow encountered in exp. The image above shows the problem with the function not fitting my data.

Is the model function I'm using correct for my data? And if so, how can I calculate the fitted function parameters better so that I can use them with new data?

Christian Seitz
  • 728
  • 6
  • 15
Dan
  • 95
  • 1
  • 5
  • 1
    possible duplicate of [Overflow Error in Python's numpy.exp function](https://stackoverflow.com/q/40726490/8128190)? – singrium Oct 09 '19 at 11:20
  • The answer to that question seems to recommend using a `float128` but I dont have that type on Windows. – Dan Oct 09 '19 at 15:23

1 Answers1

5

I think you need to take the log of the X data. Here is my plot for quadratic logarithmic equation "y = a + b * np.log(x) + c * np.log(x) ** 2". The scipy default initial parameter estimates of all 1.0 worked fine for me, giving RMSE = 4.020 and R-squared = 0.9999.

plot

UPDATE: added Python graphical fitter

import numpy, scipy, matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

xData = numpy.array([   1,    4,    9,   16,   25,   36,   49,   64,   81,  100,  121,
        144,  169,  196,  225,  256,  289,  324,  361,  400,  441,  484,
        529,  576,  625,  676,  729,  784,  841,  900,  961, 1024, 1089,
       1156, 1225, 1296, 1369, 1444, 1521, 1600, 1681, 1764, 1849, 1936,
       2025, 2116, 2209, 2304, 2401, 2500, 2601, 2704, 2809, 2916, 3025,
       3136, 3249, 3364, 3481, 3600, 3721, 3844, 3969, 4096, 4225, 4356,
       4489, 4624, 4761, 4943], dtype=float)

yData = numpy.array([3630, 2590, 2063, 1726, 1484, 1301, 1155, 1036,  936,  851,  778,
        714,  657,  607,  562,  521,  485,  451,  421,  390,  362,  336,
        312,  293,  279,  265,  253,  241,  230,  219,  209,  195,  183,
        171,  160,  150,  142,  134,  127,  120,  114,  108,  102,   97,
         91,   87,   83,   80,   76,   73,   70,   67,   64,   61,   59,
         56,   54,   51,   49,   47,   45,   43,   41,   40,   38,   36,
         35,   33,   31,   30], dtype=float)


def func(x, a, b, c): # simple quadratic example
    return a + b*numpy.log(x) + c*numpy.log(x)**2


# these are the same as the scipy defaults
initialParameters = numpy.array([1.0, 1.0, 1.0])

# curve fit the test data
fittedParameters, pcov = curve_fit(func, xData, yData, initialParameters)

modelPredictions = func(xData, *fittedParameters) 

absError = modelPredictions - yData

SE = numpy.square(absError) # squared errors
MSE = numpy.mean(SE) # mean squared errors
RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))

print('Parameters:', fittedParameters)
print('RMSE:', RMSE)
print('R-squared:', Rsquared)

print()


##########################################################
# graphics output section
def ModelAndScatterPlot(graphWidth, graphHeight):
    f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
    axes = f.add_subplot(111)

    # first the raw data as a scatter plot
    axes.plot(xData, yData,  'D')

    # create data for the fitted equation plot
    xModel = numpy.linspace(min(xData), max(xData))
    yModel = func(xModel, *fittedParameters)

    # now the model as a line plot
    axes.plot(xModel, yModel)

    axes.set_xlabel('X Data') # X axis data label
    axes.set_ylabel('Y Data') # Y axis data label

    plt.show()
    plt.close('all') # clean up after using pyplot

graphWidth = 800
graphHeight = 600
ModelAndScatterPlot(graphWidth, graphHeight)
James Phillips
  • 4,526
  • 3
  • 13
  • 11
  • 1
    Amazing! Thanks so much James. Changing my function to your equation seems to work. Can I ask what Python code you used to calculate the RMSE and R-squared values? – Dan Oct 09 '19 at 15:22
  • 1
    I have added the Python 3 code I used, see "UPDATE" in my answer. The code includes the calculation of the fit statistics. – James Phillips Oct 09 '19 at 18:12