0

I have a data like below and I would like to fit a straight line (like a Black line) ignoring the points following ceiling effect. Then I would like to predict Y values of these points using the slope and find the expected difference due to ceiling effect.

I tried using following code from other stackoverflow question but it does not do a good job at ignoring points with ceiling effect.

from scipy.optimize import curve_fit

def f(x, A, B): # this is your 'straight line' y=f(x)
    return A*x + B

popt, pcov = curve_fit(f, x, y) # your data x, y to fit

Please feel free to provide any ideas or suggestions. Thank you.

data

Arch Desai
  • 191
  • 1
  • 8
  • 1
    to use `curve_fit` in the manner you are outlining you would need to exclude points that exhibit a ceiling effect. Is there a way of doing that with your data? – DrBwts Feb 09 '22 at 21:32
  • There is no threshold value of x that I can use to filter data otherwise it would have made my life easier. – Arch Desai Feb 09 '22 at 21:35
  • you could exclude all data `x > 850` just to get the regression line – DrBwts Feb 09 '22 at 21:37
  • If your data is unordered in `x` (so you can't use my answer below), you can use a `y` threshold. If you filter your data so that it only fits data `y < 2000` then you should get a good fit. Given the data is linear, losing part of the data isn't a huge problem. Also, I imagine the ceiling is due to sensor saturation so the part prior to that may have non-linearity related to sensor response so rejecting part of it may actually be good for the fit. – Pepsi-Joe Feb 09 '22 at 22:11
  • You can find a solution to this [here](https://de.scribd.com/document/380941024/Regression-par-morceaux-Piecewise-Regression-pdf) on page 8. Alternatively you may modify [this](https://stackoverflow.com/a/70713496/803359) solution – mikuszefski Feb 10 '22 at 15:07

3 Answers3

0

One option is to write a loop that ensures excludes data until the fit is good:

from scipy.stats import linregress

fit = linregress(x, y)
i=0
while fit.rvalue < 0.999:
     fit = linregress(x[:-i], y[:-i])
     i += 1

Note that this assumes your data is ordered and I prefer linregress for fitting lines as it gives an easy-to-access rvalue.

Muhammad Mohsin Khan
  • 1,444
  • 7
  • 16
  • 23
Pepsi-Joe
  • 447
  • 1
  • 10
0

Assuming you have your X and Y as np.array, you can filter out the samples with x > 850 (or y > 2000 from what I see on your plot), like this:

X_clean = X[X < 850]
Y_clean = Y[X < 850]
# or [Y < 2000]

In this way you will have just the points not affected by a ceiling effect. You can then run your linear regression on that data

rikyeah
  • 1,896
  • 4
  • 11
  • 21
0

Thank you everyone for suggestions. If I modify my function like below it fits the data well.

from scipy.optimize import curve_fit

def f(x, a, b, c):
    return np.minimum(a*(x-c) + b, a*c)

popt, pcov = curve_fit(f, x, y) # your data x, y to fit

enter image description here

Arch Desai
  • 191
  • 1
  • 8