2

I have a list of lists ("sublists") and I want to see if the same sequence of any unspecified length occurs in more than one sublist. To clarify, the order of items must be preserved - I do not want the intersection of each sublist as a set. There must be at least 2 items that match sequentially. Please see example below.

Input:

someList = [[0,1,3,4,3,7,2],[2,3,4,3],[0,3,4,3,7,3]]

Desired Output: (will be printed to file but don't worry about this detail)

sublist0_sublist1 = [3,4,3] #intersection of 1st and 2nd sublists

sublist0_sublist2 = [3,4,3,7] #intersection of 1st and 3rd sublists

sublist1_sublist2 = [3,4,3] #intersection of 2nd and 3rd sublists

leon
  • 65
  • 1
  • 8
  • 2
    duplicate of http://stackoverflow.com/questions/16120751/is-this-longest-common-subsequence-correct – Colleen May 16 '13 at 16:37
  • Must the items in the result appear consecutively in the input lists? – interjay May 16 '13 at 16:40
  • @interjay Yes, if I understand your question correctly. In the example I gave, both the 1st and 3rd sublists contain '0' but in the 1st sublist the '0' is not "touching/connected to" the '3' (it is separated by '1'), which is why the '0' isn't counted as part of the intersection. – leon May 16 '13 at 17:00
  • @Colleen I don't think it's a duplicate. The question you linked me to looks very complicated and I didn't understand the Wikipedia article about LISs. My question is much simpler, or at least it is more straightforward and in simpler words. I apologize if I am too much of a noob to realize that it is a duplicate question. – leon May 16 '13 at 17:08
  • 2
    Then look up "longest common substring" algorithm, assuming you want the longest such substring when there is more than one. @Colleen's link doesn't seem relevant. – interjay May 16 '13 at 17:11
  • 1
    Ah, failed to realize the other question is 1. more restrictive than the title says and 2. does not require contiguous subsequences. In that case, look at http://stackoverflow.com/questions/14032903/longest-common-contiguous-subsequence-algorithm for algorithmic help, though it's in c++ and not python. – Colleen May 16 '13 at 17:16
  • @interjay That's pretty close to what I want, but the problem is that that solution deals with strings, and it's looking for a bunch of matching characters. In other words, for it to work with my input, I would have to convert it into a string first: [[0,1,3,4,3,7,2],[2,3,4,3]] would need to be converted to ['0134372','2343']. This is a problem since [0,1,3,4] and [0,13,4] would both become ['0134'] and turn up false results. Please correct me if I am mistaken. – leon May 16 '13 at 17:37
  • 1
    Maybe you can combine them into strings delimited by some symbol like a comma. Matching on "1,34" and [1,34] should return the same result. – MxLDevs May 16 '13 at 17:52
  • @Keikoku Oh, that's a good idea. – leon May 16 '13 at 17:53
  • I feel like it's going to be very cumbersome, converting everything into a string and then converting everything back. BUT, it looks like it'll get the job done. If anyone comes up with an elegant solution, please let me know! Thanks. – leon May 16 '13 at 17:56
  • What if you have multiple intersections between two lists? – Noctis Skytower May 16 '13 at 19:20
  • Only the longest one would get printed. If they are the same length, both would get printed. It would be another list of lists! – leon May 16 '13 at 19:41

1 Answers1

1

Whipped this up for you (including your comment that equal-length maximum sublists should all be returned in a list):

def sublists(list1, list2):
    subs = []
    for i in range(len(list1)-1):
        for j in range(len(list2)-1):
            if list1[i]==list2[j] and list1[i+1]==list2[j+1]:
                m = i+2
                n = j+2
                while m<len(list1) and n<len(list2) and list1[m]==list2[n]:
                    m += 1
                    n += 1
                subs.append(list1[i:m])
    return subs

def max_sublists(list1, list2):
    subls = sublists(list1, list2)
    if len(subls)==0:
        return []
    else:
        max_len = max(len(subl) for subl in subls)
        return [subl for subl in subls if len(subl)==max_len]

This works allright for these cases:

In [10]: max_sublists([0,1,3,4,3,7,2],[0,3,4,3,7,3])
Out[10]: [[3, 4, 3, 7]]
In [11]: max_sublists([0,1,2,3,0,1,3,5,2],[1,2,3,4,5,1,3,5,3,7,3])
Out[11]: [[1, 2, 3], [1, 3, 5]]

It's not pretty though, nor is it really fast.

You only have to figure out how to compare every sublist in your original list of sublists, but that should be easy.

[Edit: I fixed a bug and prevented your error from occurring.]

  • Elegant! Thank you!! It is clever that you looked for 2 matches occurring in a row for the "if" loop. That saves time. The only small problem I am having is that I get an error on the "max_len" line. I think it's because sometimes there's only one intersection (or none), and it's expecting to run the "for" loop at least once...or something like that. I haven't figured it out yet. – leon May 16 '13 at 23:21
  • You're welcome! The problem on your max_len line may very well just be Python2.X vs. Python 3.X. If you're using Python 2.X, you may have two write that line as: `max_len = max([len(subl) for subl in subls])` (so, with square brackets to force a list comprehension in the max-function first). I hope that works! – Ruben Tavernier May 17 '13 at 07:37
  • BTW, I think I made a typo in my line `for j in range(len(list2)-2):`, which should only subtract 1 (you can see where I got confused. ;-) -- I will retest and edit. – Ruben Tavernier May 17 '13 at 07:42
  • It turns out that the max_len line error is way simpler. You probably got `ValueError: max() arg is an empty sequence`, so I fixed the function to return an empty list when no matching sublists are found. I also fixed a typo that bugged some tests (I was tired, yesterday). – Ruben Tavernier May 17 '13 at 07:56
  • 1
    Found the bug! I typed "print" instead of "return" at the end of sublists(). Haha. I don't think it's even necessary to tell max_sublists() to return an empty list, because sublists() is already returning an empty list as a default. Thanks for all your help! – leon May 17 '13 at 17:04
  • You're welcome, but it's still necessary, because if sublists returns an empty list, calling `max(len(subl) for subl in [])` will result in a `ValueError: max() arg is an empty sequence`. – Ruben Tavernier May 19 '13 at 08:35