4

i know this question might look like a duplicate. but i had a hard time trying to solve this and i couldn't find a helpful solution for my case

i'am implementing a genetic algorithm using python for the traveling salesman problem

assume we have those lists ( tours)

a = [1,0,2,5,4,3,1]
b = [1,2,5,4,3,0,1]
c = [1,3,5,4,2,0,1]

as you can see, the [5,4] is repeated in the whole 3 lists and a regular intersection would return all the elements in the list.

i want some function like intersect_list(a,b)

that returns [5,4]

is there a python built-in way to find this? or do you have any suggestion?.

Note : i know i can loop it to solve this , but please put in mind that in my case i have around 400 lists. and at the length of 401 each.

in other words : i want to see the common path between those lists.

please let me know if anything was unclear thanks in advance.

Moayyad Yaghi
  • 3,622
  • 13
  • 51
  • 67

4 Answers4

3

After taking a look at the links posted by @pyfunc, I came up with the following:

def shortest_of(lists):
    return min(lists, key=len)

def contains_sublist(lst, sublst):
    n = len(sublst)
    return any((sublst == lst[i:i+n]) for i in xrange(len(lst)-n+1)) 

def longest_common(lists):
    if not lists:
        return ()
    res = set()    
    base = shortest_of(lists)
    length = len(base)

    for i in xrange(length, 0, -1):
        for j in xrange(length - i + 1):
            candidate = ', ' + str(base[j:i+j]).strip('[]') + ','
            #candidate = base[j:i+j]  

            for alist in lists:
                if not candidate in ', ' + str(alist).strip('[]') + ',':
                #if not contains_sublist(alist, candidate):   
                    break
            else:
                res.add(tuple([int(a) for a in candidate[2:-1].split(',')]))
                #res.add(tuple(candidate))

        if res:
            return tuple(res)    

    return ()

if __name__ == '__main__':
    a = [1,0,2,5,4,3,1]
    b = [1,2,5,4,3,0,1]
    c = [1,3,5,4,2,0,1]

    print longest_common([a,b,c])
    print longest_common([b,c])

output:

((5, 4),)
((0, 1), (5, 4))

EDIT:

Updated solution to use string conversions and matching as it happened to be way faster. Previous solution parts are commented out. Also, it now gives all possibilities.

Amr
  • 695
  • 4
  • 11
  • nice, checking from large to small and not doing all the work up front (as i did), makes it way quicker :) – fraxel Jun 08 '12 at 02:41
  • Yes, but I think mine would be slower if the largest common sublist contains very few elements. I'm not good at benchmarking, so I haven't tried. – Amr Jun 08 '12 at 02:49
  • @Amr: good to see that you worked it out and posted your answer back – pyfunc Jun 12 '12 at 20:17
1

One idea is that you can convert your list into a string with

",".join(list)

and then the problem is transformed to longest matching substring in two strings.

Solution and discussion for that is there on SO at :

  1. Longest common substring from more than two strings - Python
  2. http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Longest_common_substring#Python
Community
  • 1
  • 1
pyfunc
  • 65,343
  • 15
  • 148
  • 136
1

400 lists of length 400 isn't too much of a problem. First break each sequence into all its possible subsequences, (a list of length N has around 0.5 * N ** 2 possible subsequences). Then intersect them all and take the longest one.

a = [1,0,2,5,4,3,1]
b = [1,2,5,4,3,0,1]
c = [1,3,5,4,2,0,1]

def longest_match_finder(lists):
    matches = []
    for a in lists:
        lengths = set()
        for leng in xrange(1,len(a)+1):
            lengths = lengths | set(tuple(a[i:i+leng]) 
                                    for i in xrange(len(a)-leng+1))
        matches.append(lengths)
    return max(set.intersection(*matches), key=len)

print longest_match_finder([a,b,c])
#Output:
(5, 4)

With 400 lists each with 400 elements, this takes around 280 seconds (on my very old machine). However if we use the same approach on just one list, but convert its sub-sequences and also all the others lists to strings (as first posted by @pyfunc), using str(list).strip('[]'), we can search much quicker. The same test runs in 21 seconds:

import ast

def longest_match_finder_2(lists):
    a = lists[0]
    lengths = set()
    for leng in xrange(1,len(a)+1):
        lengths = lengths | set(str(a[i:i+leng]).strip('[]') 
                                for i in xrange(len(a)-leng+1))
    for seq in lengths.copy():
        if not all([seq in str(i).strip('[]') for i in lists[1:]]):
            lengths.remove(seq)
    return ast.literal_eval(max(lengths, key=len))

We can use ast.literal_eval() to get a list back (safely) at the end.

fraxel
  • 34,470
  • 11
  • 98
  • 102
  • There's a problem, try `longest_match_finder([a,b])`, the output is `(5,4,3)` while it should be `(2,5,4,3)` – Amr Jun 08 '12 at 02:10
  • @Amr - Thanks, fixed it. Stupidly taking max on magnitude not length! – fraxel Jun 08 '12 at 02:26
  • I updated my code my code to use strings too. there's a little problem though, eg: `[1,2,3]` would be `'1 , 2, 3'` which would match `[11,2,3]` `'11, 2, 3'`. I fell for it too :) – Amr Jun 08 '12 at 03:41
  • @Amr - yeah, i noticed that, just as i realised i was far too tired, and should go to bed :) – fraxel Jun 08 '12 at 08:16
-1

You can use the list zip function to zip them into tuples and return the tuples where all the elements are the same.

a = [1,0,2,5,4,3,1]
b = [1,2,5,4,3,0,1]
c = [1,3,5,4,2,0,1]
zipped_tuples = zip(a, b, c)

You can try to leverage this to get the positional intersections.

Fahd A.
  • 134
  • 1
  • 2