0

This algorithm (originally implemented in unl-aligner) calculates the longest list of increasing numbers with correspondingly increasing indices in the sequence, so given

seq = [0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15]

it will return

[0, 2, 6, 9, 11, 15]

the implementation looks like

def subseq(seq, keyfn=lambda value: value):
    if not seq: return seq
    tail = []
    prev = []
    for i in range(len(seq)):
        for k in range(len(tail)-1, -1, -1):
            if keyfn(seq[tail[k]]) < keyfn(seq[i]):
                if len(tail) == k+1:
                    tail.append(i)
                elif keyfn(seq[tail[k+1]]) > keyfn(seq[i]):
                    tail[k+1] = i
                prev.append(tail[k])                    
                break
        else:
            tail.append(i)
            prev.append(None)

    i = tail[-1]
    subseq = [seq[i]]
    while prev[i] is not None:
        i = prev[i]
        subseq.append(seq[i])
    subseq.reverse()
    return subseq

The algorithm performs a linear scan, while a bisect (binary) search should be preferred. Which is the best approach to refactor it to perform a binary search?

loretoparisi
  • 15,724
  • 11
  • 102
  • 146
  • 1
    I don't think there's a way to do this more efficiently with a binary search vs. linear. The reason is because a binary search works on a sorted list, since you can compare adjacent elements. It's not obvious to me if that can be applied here. – pault May 23 '19 at 19:21
  • 2
    How can you bisect something that isn't sorted? Maybe I'm misunderstanding the question. – ggorlen May 23 '19 at 19:23
  • 1
    first sort your sequence / use `bisect` to keep the list sorted. Now you can apply binary search. – Jean-François Fabre May 23 '19 at 19:24
  • 1
    Also, your output is not the *longest increasing subsequence* - I'm not sure what that output represents. – pault May 23 '19 at 19:25
  • @pault the result is the longest consecutive sequence of increasing integers numbers as you can see from the output. – loretoparisi May 23 '19 at 19:44
  • @ggorlen that's a good point. – loretoparisi May 23 '19 at 19:44
  • 1
    @loretoparisi none of the numbers in the output are consecutive. I *think* what you're describing is the longest list of increasing numbers in the sequence with correspondingly increasing indices in the sequence. Also your sample code fails because `tail` is undefined. – pault May 23 '19 at 19:59
  • @pault you are right I mean the indexes. In fact those indexes represents words in the actual implementations (see the link). Correcting the code thanks, here it is a running example: https://repl.it/@loretoparisi/LongestIncreasingSubsequence – loretoparisi May 23 '19 at 20:03

1 Answers1

1

With this answer:

bisect = "longest_subsequence([1,2,3,4,5,6,7,2,2,2,2,2,5,1,7,8])"
_subseq = "subseq([1,2,3,4,5,6,7,2,2,2,2,2,5,1,7,8])"

from timeit import timeit

print(timeit(bisect, globals=globals(), number=10000))  # 0.2994734
print(timeit(_subseq, globals=globals(), number=10000))  # 0.32428109999999993

This is the result on a totally random test, for your example they seem almost exact time-wise

Işık Kaplan
  • 2,815
  • 2
  • 13
  • 28