0

I got a numpy.ndarray which is something like np.array([(1, 1), (2, 3), (3, 5), (4, 8), (5, 9), (6, 9), (7, 9)]) I'd like to find a curve-fitting method which can do the following two things.

  1. It can fit the scattered points to a line. This is not so hard, here I find the same question.python numpy/scipy curve fitting

  2. It can return a y-value by the tendency of the curve which is beyond the x-value of the numpy.ndaray. For example, if I have a x-value, 8, it can return a value 9.

What method should I take, does KNN or SVM(SVR) can solve this kind of problem?

I don't know if I have made it clear, I will editted my question if it was needed.

haojie
  • 593
  • 1
  • 7
  • 19
  • 3
    You can use a simple linear regression. The question here is, what kind of functional form do you want to fit? – Sheldore Dec 29 '18 at 01:45
  • With scipy optimise, you would need to explicitly pass the function to which you want to fit the data. – Sheldore Dec 29 '18 at 01:48
  • @Bazingaa Here I want to fit a curve, linear regression may not be the answer. Can I use logistic regression instead? – haojie Dec 29 '18 at 02:06
  • You have a single independent variable so linear regression is what you want. Even for more than one independent variable you can use linear regression. It's hard to help you here unless you yourself know what functional form to fit your data to – Sheldore Dec 29 '18 at 02:32

1 Answers1

1

I got an OK fit to a sigmoidal type equation "y = a / (1.0 + exp(-(x-b)/c))" with parameters a = 9.25160014, b = 2.70654566, and c = 0.80626597 yielding RMSE = 0.2661 and R-squared = 0.9924. Here is the Python graphical fitter I used with the scipy differential_evolution genetic algorithm to find initial parameter estimates. The scipy implementation of that module uses the Latin Hypercube algorithm to ensure a thorough search of parameter space, requiring parameter bounds within which to search - in this example those bounds are taken from the data maximum and minimum values. sigmoidal

import numpy, scipy, matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.optimize import differential_evolution
import warnings

data = [(1, 1), (2, 3), (3, 5), (4, 8), (5, 9), (6, 9), (7, 9)]

# data to float arrays
xData = []
yData = []
for d in data:
    xData.append(float(d[0]))
    yData.append(float(d[1]))


def func(x, a, b, c): #sigmoidal curve fitting function
    return  a / (1.0 + numpy.exp(-1.0 * (x - b) / c))


# function for genetic algorithm to minimize (sum of squared error)
def sumOfSquaredError(parameterTuple):
    warnings.filterwarnings("ignore") # do not print warnings by genetic algorithm
    val = func(xData, *parameterTuple)
    return numpy.sum((yData - val) ** 2.0)


def generate_Initial_Parameters():
    # min and max used for bounds
    maxX = max(xData)
    minX = min(xData)
    maxY = max(yData)
    minY = min(yData)

    minXY = min(minX, minY)
    maxXY = min(maxX, maxY)

    parameterBounds = []
    parameterBounds.append([minXY, maxXY]) # search bounds for a
    parameterBounds.append([minXY, maxXY]) # search bounds for b
    parameterBounds.append([minXY, maxXY]) # search bounds for c

    # "seed" the numpy random number generator for repeatable results
    result = differential_evolution(sumOfSquaredError, parameterBounds, seed=3)
    return result.x

# by default, differential_evolution completes by calling curve_fit() using parameter bounds
geneticParameters = generate_Initial_Parameters()

# now call curve_fit without passing bounds from the genetic algorithm,
# just in case the best fit parameters are aoutside those bounds
fittedParameters, pcov = curve_fit(func, xData, yData, geneticParameters)
print('Fitted parameters:', fittedParameters)
print()

modelPredictions = func(xData, *fittedParameters) 

absError = modelPredictions - yData

SE = numpy.square(absError) # squared errors
MSE = numpy.mean(SE) # mean squared errors
RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))

print()
print('RMSE:', RMSE)
print('R-squared:', Rsquared)

print()


##########################################################
# graphics output section
def ModelAndScatterPlot(graphWidth, graphHeight):
    f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
    axes = f.add_subplot(111)

    # first the raw data as a scatter plot
    axes.plot(xData, yData,  'D')

    # create data for the fitted equation plot
    xModel = numpy.linspace(min(xData), max(xData))
    yModel = func(xModel, *fittedParameters)

    # now the model as a line plot
    axes.plot(xModel, yModel)

    axes.set_xlabel('X Data') # X axis data label
    axes.set_ylabel('Y Data') # Y axis data label

    plt.show()
    plt.close('all') # clean up after using pyplot

graphWidth = 800
graphHeight = 600
ModelAndScatterPlot(graphWidth, graphHeight)
James Phillips
  • 4,526
  • 3
  • 13
  • 11
  • 1
    This answer is awesome. I may not use the same curve-fitting method as you do, but I can use it as a reference. Thanks a lot. – haojie Dec 29 '18 at 03:05
  • The real problem is to find a relatively simple equation that gives a good fit to the data. For the equation search I used my zunzun.com Python open source online curve fitter's "function finder" - the site has hundreds of known, named equations for the equation search and also uses the Differential Evolution genetic algorithm to provide initial parameter estimates for the non-linear fitter, which is what makes the equation search possible. Without initial parameter estimates from the genetic algorithm, the site would be limited to searching through linear equations only. – James Phillips Dec 29 '18 at 03:11