how to intersect lists in python with respect to index

Question

i know this question might look like a duplicate. but i had a hard time trying to solve this and i couldn't find a helpful solution for my case

i'am implementing a genetic algorithm using python for the traveling salesman problem

assume we have those lists ( tours)

a = [1,0,2,5,4,3,1]
b = [1,2,5,4,3,0,1]
c = [1,3,5,4,2,0,1]

as you can see, the [5,4] is repeated in the whole 3 lists and a regular intersection would return all the elements in the list.

i want some function like intersect_list(a,b)

that returns [5,4]

is there a python built-in way to find this? or do you have any suggestion?.

Note : i know i can loop it to solve this , but please put in mind that in my case i have around 400 lists. and at the length of 401 each.

in other words : i want to see the common path between those lists.

please let me know if anything was unclear thanks in advance.

So, to clarify, you want any runs of two or more items that are in the same order in both lists? — Gareth Latty, Jun 08 '12 at 00:07
Feels like a longest-common-contiguous-subsequence problem to me. — DSM, Jun 08 '12 at 00:18

Amr · Accepted Answer · 2012-06-08T15:46:14.910

After taking a look at the links posted by @pyfunc, I came up with the following:

def shortest_of(lists):
    return min(lists, key=len)

def contains_sublist(lst, sublst):
    n = len(sublst)
    return any((sublst == lst[i:i+n]) for i in xrange(len(lst)-n+1)) 

def longest_common(lists):
    if not lists:
        return ()
    res = set()    
    base = shortest_of(lists)
    length = len(base)

    for i in xrange(length, 0, -1):
        for j in xrange(length - i + 1):
            candidate = ', ' + str(base[j:i+j]).strip('[]') + ','
            #candidate = base[j:i+j]  

            for alist in lists:
                if not candidate in ', ' + str(alist).strip('[]') + ',':
                #if not contains_sublist(alist, candidate):   
                    break
            else:
                res.add(tuple([int(a) for a in candidate[2:-1].split(',')]))
                #res.add(tuple(candidate))

        if res:
            return tuple(res)    

    return ()

if __name__ == '__main__':
    a = [1,0,2,5,4,3,1]
    b = [1,2,5,4,3,0,1]
    c = [1,3,5,4,2,0,1]

    print longest_common([a,b,c])
    print longest_common([b,c])

output:

((5, 4),)
((0, 1), (5, 4))

EDIT:

Updated solution to use string conversions and matching as it happened to be way faster. Previous solution parts are commented out. Also, it now gives all possibilities.

nice, checking from large to small and not doing all the work up front (as i did), makes it way quicker :) — fraxel, Jun 08 '12 at 02:41
Yes, but I think mine would be slower if the largest common sublist contains very few elements. I'm not good at benchmarking, so I haven't tried. — Amr, Jun 08 '12 at 02:49
@Amr: good to see that you worked it out and posted your answer back — pyfunc, Jun 12 '12 at 20:17

score 1 · Answer 2 · edited May 23 '17 at 11:49

1

One idea is that you can convert your list into a string with

",".join(list)

and then the problem is transformed to longest matching substring in two strings.

Solution and discussion for that is there on SO at :

edited May 23 '17 at 11:49

Community

1
1

answered Jun 08 '12 at 00:10

pyfunc

65,343
15
148
136

1

Why convert to strings? The algorithm for longest substring would work for lists the same way – JBernardo Jun 08 '12 at 00:18
@JBernardo: Yes, I was thinking along largest substring and hence the answer. I will correct it. – pyfunc Jun 08 '12 at 00:20

fraxel · Answer 3 · 2012-06-08T02:25:14.060

400 lists of length 400 isn't too much of a problem. First break each sequence into all its possible subsequences, (a list of length N has around 0.5 * N ** 2 possible subsequences). Then intersect them all and take the longest one.

a = [1,0,2,5,4,3,1]
b = [1,2,5,4,3,0,1]
c = [1,3,5,4,2,0,1]

def longest_match_finder(lists):
    matches = []
    for a in lists:
        lengths = set()
        for leng in xrange(1,len(a)+1):
            lengths = lengths | set(tuple(a[i:i+leng]) 
                                    for i in xrange(len(a)-leng+1))
        matches.append(lengths)
    return max(set.intersection(*matches), key=len)

print longest_match_finder([a,b,c])
#Output:
(5, 4)

With 400 lists each with 400 elements, this takes around 280 seconds (on my very old machine). However if we use the same approach on just one list, but convert its sub-sequences and also all the others lists to strings (as first posted by @pyfunc), using str(list).strip('[]'), we can search much quicker. The same test runs in 21 seconds:

import ast

def longest_match_finder_2(lists):
    a = lists[0]
    lengths = set()
    for leng in xrange(1,len(a)+1):
        lengths = lengths | set(str(a[i:i+leng]).strip('[]') 
                                for i in xrange(len(a)-leng+1))
    for seq in lengths.copy():
        if not all([seq in str(i).strip('[]') for i in lists[1:]]):
            lengths.remove(seq)
    return ast.literal_eval(max(lengths, key=len))

We can use ast.literal_eval() to get a list back (safely) at the end.

There's a problem, try `longest_match_finder([a,b])`, the output is `(5,4,3)` while it should be `(2,5,4,3)` — Amr, Jun 08 '12 at 02:10
@Amr - Thanks, fixed it. Stupidly taking max on magnitude not length! — fraxel, Jun 08 '12 at 02:26
I updated my code my code to use strings too. there's a little problem though, eg: `[1,2,3]` would be `'1 , 2, 3'` which would match `[11,2,3]` `'11, 2, 3'`. I fell for it too :) — Amr, Jun 08 '12 at 03:41
@Amr - yeah, i noticed that, just as i realised i was far too tired, and should go to bed :) — fraxel, Jun 08 '12 at 08:16

score -1 · Answer 4 · answered Jun 08 '12 at 00:18

-1

You can use the list zip function to zip them into tuples and return the tuples where all the elements are the same.

a = [1,0,2,5,4,3,1]
b = [1,2,5,4,3,0,1]
c = [1,3,5,4,2,0,1]
zipped_tuples = zip(a, b, c)

You can try to leverage this to get the positional intersections.

answered Jun 08 '12 at 00:18

Fahd A.

134
1
2

2

This won't work, as the indices for the subsequences are not necessarily the same. – Joel Cornett Jun 08 '12 at 00:24

how to intersect lists in python with respect to index

4 Answers4

EDIT: