0

Given: n discrete data points [ti,xi], which should describe a monotonic function (ti = time, xi = data). Some of the points are "outliers", or disobey the monotonic function rule (x{i+1}>=x{i} for increasing, x{i+1}<=x{i} for decreasing).

I am trying to find an algorithm to determine the minimum number of data points I must eliminate to obtain a monotonic function. I also know if it is increasing or decreasing.

I tried with a moving median filter and identify points which are some variance above or under the filtered function, but I cannot identify all points.

What would be the best approach for this problem?

I am using MATLAB, but the solution can surely be generalized.

Andrei
  • 98
  • 5
  • 1
    What if the solution isn't unique? A simple example would be `x = [1 3 2]` where you could remove any one of the elements to make it monotonic, two mono-increasing and one mono-decreasing... If it's already ill-defined with the simplest possible case I don't imagine the general solution will come quickly! – Wolfie Feb 06 '18 at 16:55
  • Also, determining the _minimum_ number of data points to remove sounds like combinatorial optimization, and probably large complexity – Luis Mendo Feb 06 '18 at 17:09
  • Do you want to find the *length* of the longest increasing subsequence, or the subsequence itself (select specific indices of `x`)? – beaker Feb 06 '18 at 22:39
  • 2
    I forgot to add this link when I deleted my previous comment... This is the [longest increasing subsequence](https://stackoverflow.com/questions/2631726/how-to-determine-the-longest-increasing-subsequence-using-dynamic-programming) problem. – beaker Feb 06 '18 at 23:11
  • @Wolfie: it is clear the solution might not be unique, but I need any such solution. – Andrei Feb 07 '18 at 13:48
  • @beaker: that is exactly what I need! I chose your answer as the solution! – Andrei Feb 07 '18 at 13:49

2 Answers2

1

Here's a solution that finds a longest increasing subsequence from a given array using Patience sort. The solution is not necessarily unique, but it is guaranteed to have a length that is greater than or equal to any other increasing subsequence. A much simpler function is possible if you only want to know the length of the longest increasing subsequence.

function subsequence = PatienceLIS(sequence)
   % Use patience sort to generate a (not necessarily unique)
   %   longest increasing subsequence from a given sequence

   subInds = [];   % indices of subsequence elements
   for s = 1:numel(sequence)
      % put new element on first stack with top element > sequence(s)
      newStack = find(sequence(s) <= [sequence(subInds) inf], 1);
      subInds(newStack) = s;   % put current index on top of newStack
      if (newStack > 1)
         % point to the index currently to the left
         pred(s) = subInds(newStack - 1);
      end
   end
   % recover the subsequence indices
   % last element in subsequence is found from last subInds
   pathInds = subInds(end);
   while (pred(pathInds(1)) > 0)
      pathInds = [pred(pathInds(1)) pathInds];   % add predecessor index to list
   end
   subsequence = sequence(pathInds);   % recover subsequence from indices
end

Sample run:

x = [7 4 11 -1 13 12 10 8 5 14 14 12 15 6 1 3 2 9];

>> PatienceLIS(x)
ans =

    4   11   12   14   15
beaker
  • 16,331
  • 3
  • 32
  • 49
  • 1
    This is exactly what I was searching for! I am using this routine on data with very few such outliers that need elimination and you solution is flawless. For decreasing sequence the vector can either be flipped or the routine slightly changed.Thank you! – Andrei Feb 07 '18 at 13:52
0

I thought of a recursive solution of limited usefulness (as it does not result in the longest subsequence), but perhaps

  1. ... it can be expanded to suit your needs.
  2. ... it can show you why this is not a trivial problem.

function [pts_to_remove, monoseq] = q48647287

sequence = randi(1000,1000,1,'int16');
[pts_to_remove, monoseq] = remove_pts(sequence, false, 0);
% Now we can try to setdiff different subsets of `monoseq` from `sequence` and rerun the 
% algorithm. Among the results we'll take the longest `monoseq`, and repeat the procedure as 
% many times as needed.
% Of course it needs to be (dis)proven mathematically whether this approach can result in 
% the longest possible subsequence.

end

function [pts_removed, seq] = remove_pts(seq, shouldIncrease, rem_pts_so_far)

  if shouldIncrease
    d = diff([-Inf; seq]) >= 0;
  else
    d = diff([ Inf; seq]) <= 0;
  end

  to_rem = sum(~d);
  if to_rem % > 0
    pts_removed = remove_pts(seq(d), shouldIncrease, rem_pts_so_far + to_rem);
  else
    pts_removed = rem_pts_so_far;
  end

end
Dev-iL
  • 23,742
  • 7
  • 57
  • 99