1

I am trying to implement this function with recursion in python and I have a mistake. I can't understand what is the mistake, could you help me?

The code:

def LongestCommonSubsequence(X,Y,tailX,tailY):
    if tailX == tailY and tailX!='' and (X=='' or Y==''):
            return len(tailX)
    elif X=='' or Y=='':
            return 0
    else:

        return max( LongestCommonSubsequence(X[1:],Y[1:],tailX+X[0],tailY+Y[0]),
                    LongestCommonSubsequence(X[1:],Y[1:],tailX+X[0],tailY),
                    LongestCommonSubsequence(X[1:],Y[1:],tailX,tailY+Y[0]),
                    LongestCommonSubsequence(X[1:],Y[1:],tailX,tailY)) 

X=raw_input() 
Y=raw_input() 
print LongestCommonSubsequence(X,Y,'','')

input:

abccdabab 
bacdbeb 

expected output:5
what i get:4

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
michael kova
  • 105
  • 2
  • 8
  • Will X and Y always be strings? If they're some other collection, ex. a list, then `elif X=='' or Y==''` will always be False. You could take advantage of the inherent type conversion of boolean contexts and just do `elif not X or not Y:` – Kevin Feb 06 '15 at 15:13
  • I duped this to a working LLCS implementation; it works for strings as well as lists. – Martijn Pieters Feb 06 '15 at 15:18
  • If you want this reopened, please provide more information. Give us sample input, expected output, what you get instead (including error messages) and where you yourself think the problem lies. As it stands this isn't a good question. – Martijn Pieters Feb 06 '15 at 15:19
  • input: abccdabab bacdbeb expected output:5 what i get:4 – michael kova Feb 06 '15 at 15:20
  • This is not a duplicated question, I have not found a recursive implementation ! – michael kova Feb 06 '15 at 15:22
  • Thanks for providing expected and actual output :-) But I'm not entirely familiar with this kind of problem. You seem to have provided arguments for X and Y, but what should we use for `tailX` and `tailY`? Can you show exactly how to call `LongestCommonSubsequence`? – Kevin Feb 06 '15 at 15:22
  • yes : X=raw_input() Y=raw_input() print LongestCommonSubsequence(X,Y,'','') – michael kova Feb 06 '15 at 15:25
  • when I call the function tailX and tailY are empty. I use them to store the subsequences. – michael kova Feb 06 '15 at 15:25
  • Why are you storing subsequences in reverse order? Why store them *at all*? – Martijn Pieters Feb 06 '15 at 15:28
  • First, this is not a reversed order. Second, I should have written "store" because this is not actually storing them.. – michael kova Feb 06 '15 at 15:30
  • You appear to be storing the **head**, not the tail of your strings under test, which is why it does make sense to append to the existing head-so-far each time. So you get `a` then `ab` then `abc`. But that doesn't actually explain why you are doing that. – Martijn Pieters Feb 06 '15 at 15:31
  • How else would I go through all the options for sub strings recursively ? – michael kova Feb 06 '15 at 15:34

1 Answers1

2

You appear to be trying to optimise for common tail strings here; if both strings end with the same tail you can indeed skip a few recursion steps.

But you are not actually building a tail, you are building the head, the characters at the start.

Here is a working recursive llcs without that optimisation:

def llcs(xstr, ystr):
    if not xstr or not ystr:
        return 0
    x, xtail, y, ytail = xstr[0], xstr[1:], ystr[0], ystr[1:]
    if x == y:
        return 1 + llcs(xtail, ytail)
    return max(llcs(xstr, ytail), llcs(xtail, ystr))

This finds the maximum longest common substring length by comparing lengths found for removing a character from the start of either xstr or ystr, not both.

Your code specifically never pairs up X with Y[1:] or X[1:] with Y for the max() call, so you never will find an LCS for that specific starting character in either X or Y.

You can then try and optimise by looking at xtail and ytail (actual tails) and bail out early:

def llcs(xstr, ystr):
    if not xstr or not ystr:
        return 0
    x, xtail, y, ytail = xstr[0], xstr[1:], ystr[0], ystr[1:]
    if x == y:
        if xtail == ytail:
            # if the tails match, bail out early
            return 1 + len(xtail)
        return 1 + llcs(xtail, ytail)
    return max(llcs(xstr, ytail), llcs(xtail, ystr))
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • I can't see how is this recursive.. and I don't want to find the longest one - I just wan't to find that length. – michael kova Feb 06 '15 at 15:38
  • @michaelkova: it is recursive but I made typos in the recursive call name. The *length* is returned, not the actual substring. The functions return `5` for your sample input. – Martijn Pieters Feb 06 '15 at 15:39
  • Oh OK I thought you used the lcs function. Well It does work ! Thank you. Can you explain me why your first implementation works ? I'm having trouble understanding that. – michael kova Feb 06 '15 at 15:47
  • @michaelkova: take a grid, lay out one string along the top, the other along the side. Then mark all the cells where you have a matching character (that's the `x == y` case); the maximum length is then created as a path through the grid from top left to bottom right, where you can only go right or down. The recursion just looks at the top left cell and sees the rest of the table as a recursive call, one for a step to the right, and one for a step down, and it'll pick the one with the highest value returned. – Martijn Pieters Feb 06 '15 at 15:53
  • @michaelkova: the step to the right and the step down are implemented as substrings; remove the first character of `x` is one direction, remove the first character of `y` is another. – Martijn Pieters Feb 06 '15 at 15:54
  • @michaelkova: see the [wikipedia page on the problem](http://en.wikipedia.org/wiki/Longest_common_subsequence_problem), where you'll see such tables and lengths filled in. Recursion is the *traceback approach*. – Martijn Pieters Feb 06 '15 at 15:56