1

I'm trying to fit two connected lines to noisy data with a robust method, the goal is to find the interconnection (vertical dashed red line in image below).

Noisy data and solution

One approach was to detect the step in derivative (diff) but this would require filtering, estimating the threshold and could give ambiguity.

My current approach is to use fminbnd and polyfit but I wonder if there is a more performant or straight-forward way to do this:

% create sample data
x = linspace (0, 3);
y = 1.37 * x;
r = x > 1.7;
y(r) = y(r) + (x(r) - x(find (r, 1))) * 4;
y = y + 0.2 * randn (size (y));

% plot it
plot (x, y)
grid on

function ret = d (s, x, y)
  assert (s <= 1.0 && s >= 0);
  k = round (s * numel (x));

  [~, s1] = polyfit (x(1:k), y(1:k), 1);
  [~, s2] = polyfit (x(k+1:end), y(k+1:end), 1);

  ret = s1.normr + s2.normr;
end

da = @(s) d(s,x,y);

[X, FVAL, INFO, OUT] = fminbnd (da, 0, 1)
xp = x(round (X * numel (y)))

line ([xp xp], [min(y) max(y)], 'linestyle', '--', 'color', 'red')

This code was written for GNU Octave but I tried to be MATLAB compatible. Please comment if it doesn't run on Matlab

Andy
  • 7,931
  • 4
  • 25
  • 45
  • What about detecting a peak in the second derivative? Filtering may be needed, but will not affect the peak location that much if you use `filtfilt` (zero phase lag filter). – m7913d Sep 22 '17 at 07:39
  • Another option is to fit a hyperbola to the data. [This](https://stackoverflow.com/questions/19955686/fit-a-curve-for-data-made-up-of-two-distinct-regimes) may be useful. – rahnema1 Sep 22 '17 at 09:23
  • I think your approach is quite good, robust and simple. If you want to improve performance, then you can write your own polyfit variant for a straight line. At the first stage (before call of fminbnd), you can calculate the running sums that are included in the formula for calculating linear regression. In the second stage (in the function ret), calculations will be made by manipulating the accumulated running sums, ie very fast. – SergV Sep 24 '17 at 15:05

1 Answers1

0

I try to give a short outline of an approach that could be better in terms of runtime and robustness than the OP's suggestion. I would recommend to use a robust polyfit, for example using a Huber-k-estimator in a loop with a normal polyfit to select data points by residual that do have a common statistical distribution width. Because the estimator provides the distribution width it can be used to robustly select points close to the line fit with a criteria like residual < 6*sigma.

A first line fit on the full data set with this basic algorithm will give you a result that is matching the longer of the two line segments.

With a second call to the robust fit using only the data points refused by the first fit you will get the parameters of the second line segment.

Georg W.
  • 1,292
  • 10
  • 27