4

I'm trying to make a piecewise linear fit consisting of 3 pieces whereof the first and last pieces are constant. As you can see in this figure figure

don't get the expected fit, since the fit doesn't capture the 3 linear pieces clearly visual from the original data points.

I've tried following this question and expanded it to the case of 3 pieces with the two constant pieces, but I must have done something wrong.

Here is my code:

from scipy import optimize
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
plt.rcParams['figure.figsize'] = [16, 6]

x = np.arange(0, 50, dtype=float)
y = np.array([50 for i in range(10)]
             + [50 - (50-5)/31 * i for i in range(1, 31)]
             + [5 for i in range(10)],
             dtype=float)

def piecewise_linear(x, x0, y0, x1, y1):
    return np.piecewise(x,
                        [x < x0, (x >= x0) & (x < x1), x >= x1],
                        [lambda x:y0, lambda x:(y1-y0)/(x1-x0)*(x-x0)+y0, lambda x:y1])

p , e = optimize.curve_fit(piecewise_linear, x, y)
xd = np.linspace(0, 50, 101)

plt.plot(x, y, "o", label='original data')
plt.plot(xd, piecewise_linear(xd, *p), label='piecewise linear fit')
plt.legend()

The accepted answer to the previous mentioned question suggest looking at segments_fit.ipynb for the case of N parts, but following that it doesn't seem that I can specify, that the first and last pieces should be constant.

Furthermore I do get the following warning:

OptimizeWarning: Covariance of the parameters could not be estimated

What do I do wrong?

Bob
  • 13,867
  • 1
  • 5
  • 27
afd
  • 65
  • 6
  • Please provide the "original data" as in your plot. – Lukas S Jan 14 '22 at 13:34
  • The "original data" is the values stored in the variables x and y in the code I provided. Running the code should produce the plot. – afd Jan 14 '22 at 14:17
  • Is there any noise? If so, a nice solution can be found [here](https://www.scribd.com/document/380941024/Regression-par-morceaux-Piecewise-Regression-pdf) on page 16 – mikuszefski Jan 14 '22 at 15:49
  • Yes, there is noise in the data that I really want to fit, but not in the simulated data i provided in the code. Maybe it was a bad choice not to include the noise in the simulated data. I just wanted to make it work before fitting to the true data. – afd Jan 18 '22 at 13:21

3 Answers3

2

You could directly copy the segments_fit implementation

from scipy import optimize

def segments_fit(X, Y, count):
    xmin = X.min()
    xmax = X.max()

    seg = np.full(count - 1, (xmax - xmin) / count)

    px_init = np.r_[np.r_[xmin, seg].cumsum(), xmax]
    py_init = np.array([Y[np.abs(X - x) < (xmax - xmin) * 0.01].mean() for x in px_init])

    def func(p):
        seg = p[:count - 1]
        py = p[count - 1:]
        px = np.r_[np.r_[xmin, seg].cumsum(), xmax]
        return px, py

    def err(p):
        px, py = func(p)
        Y2 = np.interp(X, px, py)
        return np.mean((Y - Y2)**2)

    r = optimize.minimize(err, x0=np.r_[seg, py_init], method='Nelder-Mead')
    return func(r.x)

Then you apply it as follows

import numpy as np;

# mimic your data
x = np.linspace(0, 50)
y = 50 - np.clip(x, 10, 40)

# apply the segment fit
fx, fy = segments_fit(x, y, 3)

This will give you (fx,fy) the corners your piecewise fit, let's plot it

import matplotlib.pyplot as plt

# show the results
plt.figure(figsize=(8, 3))
plt.plot(fx, fy, 'o-')
plt.plot(x, y, '.')
plt.legend(['fitted line', 'given points'])

piecewise linear fit

EDIT: Introducing constant segments

As mentioned in the comments the above example doesn't guarantee that the output will be constant in the end segments.

Based on this implementation the easier way I can think is to restrict func(p) to do that, a simple way to ensure a segment is constant, is to set y[i+1]==y[i]. Thus I added xanchor and yanchor. If you give an array with repeated numbers you can bind multiple points to the same value.

from scipy import optimize

def segments_fit(X, Y, count, xanchors=slice(None), yanchors=slice(None)):
    xmin = X.min()
    xmax = X.max()
    seg = np.full(count - 1, (xmax - xmin) / count)

    px_init = np.r_[np.r_[xmin, seg].cumsum(), xmax]
    py_init = np.array([Y[np.abs(X - x) < (xmax - xmin) * 0.01].mean() for x in px_init])

    def func(p):
        seg = p[:count - 1]
        py = p[count - 1:]
        px = np.r_[np.r_[xmin, seg].cumsum(), xmax]
        py = py[yanchors]
        px = px[xanchors]
        return px, py

    def err(p):
        px, py = func(p)
        Y2 = np.interp(X, px, py)
        return np.mean((Y - Y2)**2)

    r = optimize.minimize(err, x0=np.r_[seg, py_init], method='Nelder-Mead')
    return func(r.x)

I modified a little the data generation to make it more clear the effect of the change

import matplotlib.pyplot as plt
import numpy as np;

# mimic your data
x = np.linspace(0, 50)
y = 50 - np.clip(x, 10, 40) + np.random.randn(len(x)) + 0.25 * x
# apply the segment fit
fx, fy = segments_fit(x, y, 3)
plt.plot(fx, fy, 'o-')
plt.plot(x, y, '.k')
# apply the segment fit with some consecutive points having the 
# same anchor
fx, fy = segments_fit(x, y, 3, yanchors=[1,1,2,2])
plt.plot(fx, fy, 'o--r')
plt.legend(['fitted line', 'given points', 'with const segments'])

enter image description here

Bob
  • 13,867
  • 1
  • 5
  • 27
1

This i consider a funny non-linear approach that works quite well. Note that even though this is highly non-linear it approximates the linear behavior very well. Moreover, the fit parameters provide the linear results. Only for the offset b a little transformation and according error propagation is required. (Also, I don't care about the value of p as long as it is somewhat larger than 5)

import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit
np.set_printoptions( linewidth=250, precision=4)
np.set_printoptions( linewidth=250, precision=4)

### piecewise linear function for data generation
def pwl( x, m, b, a1, a2 ):
    if x < a1:
        out = pwl( a1, m, b, a1, a2 )
    elif x > a2:
        out = pwl( a2, m, b, a1, a2 )
    else:
        out = m * x + b
    return out

### non-linear approximation
def func( x, m, b, a1, a2, p ):
    out = b + np.log(
    1 / ( 1 + np.exp( -m *( x - a1 ) )**p )
    ) / p - np.log(
    1 / ( 1 + np.exp( -m * ( x - a2 ) )**p )
    ) / p
    return out

### some data
nn = 36
xdata = np.linspace( -5, 19, nn )
ydata = np.fromiter( (pwl( x, -2.1, 11.6, -1.1, 12.7 ) for x in xdata ), float)
ydata += np.random.normal( size=nn, scale=0.2)
### dense grid for printing
xth = np.linspace( -5, 19, 150 )
###fitting
popt, cov = curve_fit( func, xdata, ydata, p0=[-2, 11, -1, 10, 1])
mF, betaF, a1F, a2F, pF = popt
bF = betaF - mF * a1F
sol=( mF, bF, a1F, a2F, pF  )
### transforming the covariance due to the b' -> b mapping
J1 = np.identity(5)
J1[1,0] = -popt[2]
J1[1,2] = -popt[0]
cov2 = np.dot( J1, np.dot( cov, np.transpose( J1 ) ) )
### results
print( cov2 )
for i, v in enumerate( ("m", "b", "a1", "a2", "p" ) ):
    print( "{:>2} = {:+2.4e} ± {:0.4e}".format( v, sol[i], np.sqrt( cov2[i,i] ) ) )

### plotting
fig = plt.figure()
ax = fig.add_subplot( 1, 1, 1 )
ax.plot( xdata, ydata, ls='', marker='+' )
ax.plot( xth, func( xth, -2, 11, -1, 10, 1 ) )
ax.plot( xth, func( xth, *popt ) )
plt.show()

Providing

[[ 1.3553e-04 -7.6291e-04 -4.3488e-04  4.5624e-04  1.2619e-01]
 [-7.6291e-04  6.4126e-03  3.4560e-03 -1.5573e-03 -7.4983e-01]
 [-4.3488e-04  3.4560e-03  3.4741e-03 -9.8284e-04 -4.2344e-01]
 [ 4.5624e-04 -1.5573e-03 -9.8284e-04  3.0842e-03 -5.2739e+00]
 [ 1.2619e-01 -7.4983e-01 -4.2344e-01 -5.2739e+00  3.1583e+05]]

 m = -2.0810e+00 ± 9.7718e-03
 b = +1.1463e+01 ± 6.7217e-02
a1 = -1.2545e+00 ± 5.0384e-02
a2 = +1.2739e+01 ± 4.7176e-02
 p = +1.6840e+01 ± 2.9872e+02

and

enter image description here

mikuszefski
  • 3,943
  • 1
  • 25
  • 38
  • I must say that I'm not sure what's going on in this approach. But since it is a non-linear-approach it makes rounded breaks instead of sudden breaks. What I really want is to find the values of x0, y0, x1 and y1 corresponding to the coordinates of the two points where the curve breaks. But with rounded breaks this seems somewhat imprecise. – afd Jan 18 '22 at 13:38
  • @afd The idea is to swap forth and back to linear by `log( exp( ) )` but adding a tiny perturbation to allow for the transition from one linear behavior to the next. with increasing `p` the corners become arbitrarily sharp, but especially with noisy data, the position of the kink is only within `dx`. So I do not think that the soft part (being smaller than `dx`) is a problem. Alternatively, you could make an iterative fit with given `p` increasing it every step. If you remove the noise you'll get the true linear parameters with 5 digit precision. – mikuszefski Jan 18 '22 at 14:48
  • @afd Hence, you get the linear result, but due to the fact that the function is continuously differentiable, the position of the corners do not get stuck between to data points. If required, one could use the result of this fit only to split the data into the piecewise parts, but as said before, I do not think that this is necessary. – mikuszefski Jan 18 '22 at 14:51
1

You can get a one line solution (not counting the import) using univariate splines of degree one. Like this

from scipy.interpolate import UnivariateSpline

f = UnivariateSpline(x,y,k=1,s=0)

Here k=1 means we interpolate using polynomials of degree one aka lines. s is the smoothing parameter. It decides how much you want to compromise on the fit to avoid using too many segments. Setting it to zero means no compromises i.e. the line HAS to go threw all points. See the documentation.

Then

plt.plot(x, y, "o", label='original data')
plt.plot(x, f(x), label='linear interpolation')
plt.legend()
plt.savefig("out.png", dpi=300)

gives linear spline interpolation

Lukas S
  • 3,212
  • 2
  • 13
  • 25
  • The problem with this approach is, when there is noice in the data. The it will just make N-1 number of lines between the N points and not the 3 segment I looking for. – afd Jan 18 '22 at 13:23
  • @afd Yea in that case you have to pick a bigger `s`. Then it won't. – Lukas S Jan 18 '22 at 14:30