0

I am trying to fit 3-dimensional data (that is, 2 independent and 1 dependent variable) using multivariate fitting in scipy curve_fit. I wish to do piecewise fitting for the same problem. I have tried to proceed on the basis of this without any success. The problem is defined below:

import numpy as np
from scipy.optimize import curve_fit
#..........................................................................................................
def F0(X, a, b, c, c0, y0):
    x, y = X
    value = []
    for i in range(0, len(x)):
        if y[i] < y0:
            lnZ = x[i] + c0*y[i]
        else:
            lnZ = x[i] + c*y[i]
        val = a + (b*lnZ)
        value.append(val)
    return value
#..........................................................................................................
def F1(X, a, b, c):
    x, y = X
    lnZ = x + c*y
    value = a + (b*lnZ)
    return value
#..........................................................................................................
x = [-2.302585093,
-2.302585093, 
-2.302585093, 
-2.302585093, 
-2.302585093,
-2.302585093,
-2.302585093,
0,
0,
0,
0,
0,
0,
0,
2.302585093,
2.302585093,
2.302585093,
2.302585093,
2.302585093,
2.302585093,
2.302585093
]

y = [7.55E-04,
7.85E-04,
8.17E-04,
8.52E-04,
8.90E-04,
9.32E-04,
9.77E-04,
7.55E-04,
7.85E-04,
8.17E-04,
8.52E-04,
8.90E-04,
9.32E-04,
9.77E-04,
7.55E-04,
7.85E-04,
8.17E-04,
8.52E-04,
8.90E-04,
9.32E-04,
9.77E-04
]

z = [4.077424497,
4.358253892, 
4.610475878, 
4.881769469,
5.153063061,
5.323277142,
5.462023074,
4.610475878,
4.840765517,
5.04864602,
5.235070966,
5.351407761,
5.440090728,
5.540693448,
4.960439843,
5.118257381,
5.266539115,
5.370479367,
5.440090728,
5.528296904,
5.5816974,
]

popt, pcov = curve_fit(F0, (x, y), z, method = 'lm')
print(popt)

popt, pcov = curve_fit(F1, (x, y), z, method = 'lm')
print(popt)

The output is:

[1.34957781e+00 1.05456428e-01 1.00000000e+00 4.14879613e+04
 1.00000000e+00]
[1.34957771e+00 1.05456434e-01 4.14879603e+04]

You can see that the values of parameters in the piecewise fitting remain as the initial values. I know I am not doing it in the correct way. Please correct me.

Parag
  • 41
  • 8

1 Answers1

0

The main source of the problem is the insensitivity of this approach to the value of the variable that defines the switch from one function to another (see this response for a similar explanation). Moreover, the choice of starting parameters isn't good.

Since no starting values are provided, curve_fit chooses a value of 1 for all the fitting parameters (see here the default value for p0). Since the fitting algorithm works by making small variations on the parameters, y0 is varied in small steps around 1, which produces no changes in the output of the function (all y values are much smaller than 1). Since y[i] < y0 is always True and only the first branch is ever evaluated, and the output of the function does not depend on the value of c. That explains why y0 and c stay at the initial values.

One might expect that setting y0 initial value to be inside of the range of values that are evaluated (i.e. around 8E-4) might solve the problem. Indeed, since the second branch is evaluated, the value of c is now optimized. Nevertheless, y0 value will stay unchanged. As the fitting algorithm works testing very small changes to the values, the changes are not large enough to move from the interval between two experimental y values to another one. In this particular case, if one chooses 8E-4, the small variations will never be enough to make it go over 8.17E-04 or below 7.85E-4, that are the values encompassing initial y0 choice.

One can usually circumvent this problem making the function depend explicitly on the value of y0. A smart choice would be to redefine the function so the value at y0 is the same no matter which branch is taken (i.e. ensure that the function is continuous). In this case, the function definition does not ensure so. A reasonable change would be:

def F2(X, a, b, c, c0, y0):
    x, y = X
    value = []
    for i in range(0, len(x)):
        lnZ = x[i] + c0 * y[i]
        if y[i] >= y0:
            lnZ += c * (y[i]-y0)
        val = a + (b*lnZ)
        value.append(val)
    return value

which changes the meaning of the parameter c, and limits the results to only continuous functions. In this case, the value of y0 is indeed the function turning point. Nevertheless, it yields the desired results:

popt2, pcov = curve_fit(F2, (x, y), z, p0=(1, 1, 1E4, 1E4, 9.1E-4), method = 'lm')
print(popt2)

results in:

[-1.93417968e-01  1.05456433e-01 -3.65740192e+04  5.97890809e+04
  8.64354057e-04]

A better (pythonic) definition for the function avoids the for loop:

def F3(X, a, b, c, c0, y0):
    x, y = X
    lnZ = x + c0 * y
    idx = np.where(y>=y0)
    lnZ[idx] += c * (y[idx] - y0)
    rv = a + (b * lnZ)
    return rv

which will probably be much faster for larger datasets.

azelcer
  • 1,383
  • 1
  • 3
  • 7
  • The point is, whatever starting parameters you choose, the values will remain same. I have already tried that too. Thank youj for your comment. – Parag Dec 27 '21 at 05:14
  • Yes, they stay the same and I explain why. It is easy to circumvent this issue, but it is necessary to change the function (for example, ensuring that it is continuous) – azelcer Dec 27 '21 at 18:41
  • Yes, exactly. That's the question, the problem lies with the way the function is defined - the unchanged initial values indicate that. – Parag Dec 28 '21 at 07:09
  • I have added a reasonable redefinition of the function, that only introduces continuty. I don't think you can get a better fit without using a very different function. – azelcer Dec 28 '21 at 23:23
  • The value of y0 for your functions do make sense and so is the value of b. The data is related to an actual physical problem which requires that c, c0 > 0. I shall check for other datasets. Thank you so much. – Parag Dec 29 '21 at 07:35
  • @Parag: The meaning of `c` has changed. It's like (but not precisely) `c-c0` of the previous function. I would suggest to re-check the physical model with a the python implementation for a fixed value of `x`, just to see if the translation from the model to the program is correct. – azelcer Dec 29 '21 at 14:01
  • Yes, it is correct for y0; check with fixed x. Thanks. – Parag Dec 29 '21 at 15:35