Python curve fit with change point

Question

As I'm really struggleing to get from R-code, to Python code, I would like to ask some help. The code I want to use has been provided to my from withing the mathematics forum of stackexchange.

https://math.stackexchange.com/questions/2205573/curve-fitting-on-dataset

I do understand what is going on. But I'm really having a hard time trying to solve the R-code, as I have never seen anything of it. I have written the function to return the sum of squares. But I'm stuck at how I could use a function similar to the optim function. And also I don't really like the guesswork at the initial values. I would like it better to run and re-run a type of optim function untill I get the wanted result, because my needs for a nearly perfect curve fit are really high.

def model (par,x):
    n = len(x)
    res = []
    for i in range(1,n):
        A0 = par[3] + (par[4]-par[1])*par[6] + (par[5]-par[2])*par[6]**2
        if(x[i] == par[6]):
            res[i] = A0 + par[1]*x[i] + par[2]*x[i]**2
        else:
            res[i] = par[3] + par[4]*x[i] + par[5]*x[i]**2
    return res

This is my model function...

def sum_squares (par, x, y):
    ss = sum((y-model(par,x))^2)
    return ss

And this is the sum of squares

But I have no idea on how to convert this:

 #I found these initial values with a few minutes of guess and check.
 par0 <- c(7,-1,-395,70,-2.3,10)
 sol <- optim(par= par0, fn=sqerror, x=x, y=y)$par

To Python code...

Careful with `^`. In Python `**` is the power operator so you need to write `x**2` to get x squared. — MB-F, Mar 30 '17 at 12:54
Oh yes off course, sorry about that one, gonna fix that error.But the real thing is, how to get from: #I found these initial values with a few minutes of guess and check. par0 <- c(7,-1,-395,70,-2.3,10) sol <- optim(par= par0, fn=sqerror, x=x, y=y)$par These 3 lines to Python — Jarich Braeckevelt, Mar 30 '17 at 12:55
You may find [this post](https://stackoverflow.com/questions/21288133/loading-rdata-files-into-python) and [this post](http://pandas.pydata.org/pandas-docs/stable/r_interface.html) useful. — , Mar 30 '17 at 12:55
@mikey Thanks for the suggestion, but I have my data in Python, that ain't the problem. I just want to make the functions in R-code work on my Python data, so I want to convert the R-code to Python code... — Jarich Braeckevelt, Mar 30 '17 at 12:59

score 0 · Answer 1 · answered Mar 30 '17 at 14:00

Seems like I have been able to fix the problem.

def model (par,x):
    n = len(x)
    res = np.array([]) 
    for i in range(0,n):
        A0 = par[2] + (par[3]-par[0])*par[5] + (par[4]-par[1])*par[5]**2
        if(x[i] <= par[5]):
            res = np.append(res, A0 + par[0]*x[i] + par[1]*x[i]**2)
        else:
            res = np.append(res,par[2] + par[3]*x[i] + par[4]*x[i]**2)
    return res

def sum_squares (par, x, y):
    ss = sum((y-model(par,x))**2)
    print('Sum of squares = {0}'.format(ss))
    return ss

And then I used the functions as follow:

parameter = sy.array([0.0,-8.0,0.0018,0.0018,0,200])
res = least_squares(sum_squares, parameter, bounds=(-360,360), args=(x1,y1),verbose = 1)

The only problem is that it doesn't produce the results I'm looking for... And that is mainly because my x values are [0,360] and the Y values only vary by about 0.2, so it's a hard nut to crack for this function, and it produces this (poor) result:

Result

score 0 · Accepted Answer · answered Mar 30 '17 at 22:05

I wrote an open source Python package (BSD license) that has a genetic algorithm (Differential Evolution) front end to the scipy Levenberg-Marquardt solver, it functions similarly to what you describe in your question. The github URL is:

https://github.com/zunzun/pyeq3

It comes with a "user-defined function" example that's fairly easy to use:

https://github.com/zunzun/pyeq3/blob/master/Examples/Simple/FitUserDefinedFunction_2D.py

along with command-line, GUI, cluster, parallel, and web-based examples. You can install the package with "pip3 install pyeq3" to see if it might suit your needs.

score 0 · Answer 3 · answered Mar 31 '17 at 01:29

I think that the range of x values [0, 360] and y values (which you say is ~0.2) is probably not the problem. Getting good initial values for the parameters is probably much more important.

In Python with numpy / scipy, you would definitely want to not loop over values of x but do something more like

def model(par,x):
    res = par[2] + par[3]*x + par[4]*x**2        
    A0  = par[2] + (par[3]-par[0])*par[5] + (par[4]-par[1])*par[5]**2
    res[np.where(x <= par[5])] = A0 + par[0]*x + par[1]*x**2 
    return res

It's not clear to me that that form is really what you want: why should A0 (a value independent of x added to a portion of the model) be so complicated and interdependent on the other parameters?

More importantly, your sum_of_squares() function is actually not what least_squares() wants: you should return the residual array, you should not do the sum of squares yourself. So, that should be

def sum_of_squares(par, x, y): 
    return (y - model(par, x))

But most importantly, there is a conceptual problem that is probably going to plague this model: Your par[5] is meant to represent a breakpoint where the model changes form. This is going to be very hard for these optimization routines to find. These routines generally make a very small change to each parameter value to estimate to derivative of the residual array with respect to that variable in order to figure out how to change that variable. With a parameter that is essentially used as an integer, the small change in the initial value will have no effect at all, and the algorithm will not be able to determine the value for this parameter. With some of the scipy.optimize algorithms (notably, leastsq) you can specify a scale for the relative change to make. With leastsq that is called epsfcn. You may need to set this as high as 0.3 or 1.0 for fitting the breakpoint to work. Unfortunately, this cannot be set per variable, only per fit. You might need to experiment with this and other options to least_squares or leastsq.

@M Newville, Thanks for your answer, but my problem will still occur as the value for the change point is so much higher than the other values... I'll test some more, I get kind of decent results now, the value of sum of squares is now around 0.012, but I want it to be even lower — Jarich Braeckevelt, Mar 31 '17 at 07:34
the absolute value of any particular variable generally does not matter much. The Fortran code underlying `leastsq` attempts to take this into account. The *relative* size of variables can matter when they reach 5 to 10 orders of magnitude. So, I doubt this is a problem for you. — M Newville, Apr 01 '17 at 15:18

Python curve fit with change point

3 Answers3