7

Is there a built-in function in python which returns a length of longest common subsequence of two lists?

a=[1,2,6,5,4,8]
b=[2,1,6,5,4,4]

print a.llcs(b)

>>> 3

I tried to find longest common subsequence and then get length of it but I think there must be a better solution.

Milano
  • 18,048
  • 37
  • 153
  • 353
  • 1
    Your sample output is wrong; the LCS is `[2, 6, 5, 4]` so the *length* is 4. – Martijn Pieters Jul 03 '14 at 07:50
  • @MartijnPieters No, it is correct. LCS is [6,5,4] have a look again :) Your function says the same. >>> 3 – Milano Jul 03 '14 at 08:01
  • 1
    That is not the LCS. It may be the longest **contiguous** subsequence, but not the LCS. The LCS does not have to be contiguous (all elements right next to one another). You are confusing LCS with the [longest common substring problem](http://en.wikipedia.org/wiki/Longest_common_substring_problem). – Martijn Pieters Jul 03 '14 at 08:11
  • @MartijnPieters Thanks for explanation, what is LCS then exactly? It is something like intersection of two sets but with possible duplicates? – Milano Jul 03 '14 at 08:25
  • Both sequences contain `[2, 6, 5, 4]`, in that order. In `a` the elements are consecutive, but in `b` there is one value in between (`1`), but it is a subsequence of both `a` and `b`. – Martijn Pieters Jul 03 '14 at 08:40
  • See the [Wikipedia article](http://en.wikipedia.org/wiki/Longest_common_subsequence_problem); the intro calls out the common confusion as well. – Martijn Pieters Jul 03 '14 at 08:42

1 Answers1

13

You can easily retool a Longest Common Subsequence (LCS) into a Length of the Longest Common Subsequence (LLCS):

def lcs_length(a, b):
    table = [[0] * (len(b) + 1) for _ in range(len(a) + 1)]
    for i, ca in enumerate(a, 1):
        for j, cb in enumerate(b, 1):
            table[i][j] = (
                table[i - 1][j - 1] + 1 if ca == cb else
                max(table[i][j - 1], table[i - 1][j]))
    return table[-1][-1]

Demo:

>>> a=[1,2,6,5,4,8]
>>> b=[2,1,6,5,4,4]
>>> lcs_length(a, b)
4

If you wanted the longest common substring (a different, but related problem, where the subsequence is contiguous), use:

def lcsubstring_length(a, b):
    table = [[0] * (len(b) + 1) for _ in range(len(a) + 1)]
    longest = 0
    for i, ca in enumerate(a, 1):
        for j, cb in enumerate(b, 1):
            if ca == cb:
                length = table[i][j] = table[i - 1][j - 1] + 1
                longest = max(longest, length)
    return longest

This is very similar to the lcs_length dynamic programming approach, but we track the maximum length found so far (since it is no longer guaranteed the last element in the table is the maximum).

This returns 3:

>>> lcsubstring_length(a, b)
3

A sparse table variant to not have to track all the 0s (use this if a and b are potentially very large):

def lcsubstring_length(a, b):
    table = {}
    longest = 0
    for i, ca in enumerate(a, 1):
        for j, cb in enumerate(b, 1):
            if ca == cb:
                length = table[i, j] = table.get((i - 1, j - 1), 0) + 1
                longest = max(longest, length)
    return longest
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • if you provide explantion or algorithm how u solved the answer . will be helpful to learn – sundar nataraj Jul 03 '14 at 08:15
  • 1
    @sundarnatarajСундар: it is a Python implementation of the [dynamic algorithm described in the Wikipedia article](http://en.wikipedia.org/wiki/Longest_common_subsequence_problem#Code_for_the_dynamic_programming_solution). – Martijn Pieters Jul 03 '14 at 08:19
  • Just a quick update, in *Python 3* there is no `xrange()`, just use `range()`. – juanbretti Nov 21 '22 at 14:37
  • @juanbretti: I'll update the answer to use Python 3 syntax, but note that the question was asked 8 years ago. :-) – Martijn Pieters Nov 25 '22 at 12:11