1

I have a list of indexes:

[24, 175, 78, 80, 659, 126, 141, 149, 29, 158, 178, 179]

I want to know how to identify (and remove) which ones are out of sequence.

desired outcome:

[24, 78, 80, 126, 141, 149, 158, 178, 179]

As a human, I see that 175, 659, and 29 stand out, but am unsure of how to do this programmatically. I have tried a pairwise comparison (a subset of the index returning the first value if sub_arr[0] < sub_arr[1].

new_ls = []

def eval_adjacent(ls):
    if ls[1] > ls[0]:
        return ls[0]

for n, ele in enumerate(idx_ls[:-1]):
    res = eval_adjacent(idx_ls[n:n+2])
    if res:
        new_ls.append(res)

However, if the integer is less than it should be, this won't work (29). Have thought about iterating in both directions but am starting to think this is not the way to go.

I think comparing it to sorted(ls) is potentially easier - but am not sure how to select the ones which are desirable (rejecting the remainder).

Can anyone point me in the right direction?

I'mahdi
  • 23,382
  • 5
  • 22
  • 30
Cheeter_P
  • 71
  • 8
  • 4
    How would you decide that your desired outcome is correct as opposed to, for example, [24, 175, 178, 179]? Can we simply look for the **longest** increasing list that can be obtained by removing elements? – Ben Grossmann Dec 23 '22 at 15:51
  • 8
    Which number is "out of sequence" in `[1, 3, 2]`? Should it be `[1, 3]` or `[1, 2]`, and why? – tobias_k Dec 23 '22 at 15:51
  • it looks like you want to check for outliers, in which case, you should find a regression that fit's your need. you can then substract the regression from your data, compute the deviation then remove all values that are out of (2) or (3) times the deviation, you will have to fine tune your script – PandaBlue Dec 23 '22 at 15:56
  • are you trying to say ... every number in the middle of two numbers cannot be larger than the number to its right or left, (barring first and last number of the list) ? – Zenith_1024 Dec 23 '22 at 16:01
  • 1
    If @BenGrossmann guess is correct, you might find your answer here: https://stackoverflow.com/questions/3992697/longest-increasing-subsequence – JonSG Dec 23 '22 at 16:16
  • @tobias_k As a human, I see that `1` stands out there, it's the only one that's not prime. So the result should be `[3, 2]`. – Kelly Bundy Dec 23 '22 at 16:29
  • As a human, you don't realize that the problem is incompletely specified, making the solution ambiguous. –  Dec 23 '22 at 16:42

4 Answers4

1

It seems like you want the longest increasing subsequence.

Try this:

from math import floor


# from https://en.wikipedia.org/wiki/Longest_increasing_subsequence
def lis(X):
    N = len(X)
    P = [0] * N
    M = [0] * N
    M[0] = -1

    L = 0
    for i in range(N):
        lo = 1
        hi = L + 1
        while lo < hi:
            mid = lo + floor((hi-lo)/2)
            if X[M[mid]] > X[i]:
                hi = mid
            else:
                lo = mid + 1

        newL = lo

        P[i] = M[newL-1]
        M[newL] = i

        if newL > L:
            L = newL

    S = [0] * N
    k = M[L]
    for j in range(L-1, -1, -1):
        S[j] = X[k]
        k = P[k]

    S = [el for el in S if el != 0]
    return S


data = [24, 175, 78, 80, 659, 126, 141, 149, 29, 158, 178, 179]
print(lis(data))  # => [24, 78, 80, 126, 141, 149, 158, 178, 179]
Michael M.
  • 10,486
  • 9
  • 18
  • 34
1

You can use dynamic programming:

def long_inc_seq(lst):
    dp = [[n] for n in lst]
    for i in range(len(lst)):
        for j in range(i):
            if lst[i] > lst[j] and len(dp[i]) < len(dp[j]) + 1:
                dp[i] = dp[j] + [lst[i]]
    return max(dp, key=len)

result = long_inc_seq([24, 175, 78, 80, 659, 126, 141, 149, 29, 158, 178, 179])
print(result)

Output:

[24, 78, 80, 126, 141, 149, 158, 178, 179]

For explanation:

# The inside of dp for the above example:
>>> dp
[[24],
 [24, 175],
 [24, 78],
 [24, 78, 80],
 [24, 78, 80, 659],
 [24, 78, 80, 126],
 [24, 78, 80, 126, 141],
 [24, 78, 80, 126, 141, 149],
 [24, 29],
 [24, 78, 80, 126, 141, 149, 158],
 [24, 78, 80, 126, 141, 149, 158, 178],
 [24, 78, 80, 126, 141, 149, 158, 178, 179]]
I'mahdi
  • 23,382
  • 5
  • 22
  • 30
0

You can use list comprehension and windowed from more_itertools:

  • the windowed function allows you to create a sliding window of the set
  • each sliding window is examined for numbers that are out of sequence
  • these numbers are listed in the variable called nums_to_reject
  • and are then deleted from the original list of indices to produce the result

Code:

from more_itertools import windowed

indices = [24, 175, 78, 80, 659, 126, 141, 149, 29, 158, 178, 179]

nums_to_reject = [item[1] for item in windowed(indices, 3) if item[0] < item[2] < item[1] or item[1] < item[2] > item[0] > item[1]]    
result = sorted(list(set(indices) - set(nums_to_reject)))        

print(result)

Output:

[24, 78, 80, 126, 141, 149, 158, 178, 179]
ScottC
  • 3,941
  • 1
  • 6
  • 20
0

Here's a method that I think can be understood by any type of python programmer: (I'm assuming that there are no 2 subsequent numbers out of sequence)

my_list = [24, 175, 78, 80, 659, 126, 141, 149, 29, 158, 178, 179]
my_list2 = []
for i in range(len(my_list)):
    try:
        if not (my_list[i + 1] > my_list[i] and my_list[i + 1] > my_list[i + 2]):
            my_list2.append(my_list[i + 1])
    except(IndexError):  # we add this to prevent INDEX OUT OF RANGE exception
        pass
print(my_list2)

OUTPUT

[78, 80, 126, 141, 29, 158, 178]