-2

Recently i encounter with a question about Find the intersection between sublists . that tells the sublist which have any (1 or more ) intersection together become one. for example the following list :

l=[[1,2,3],[0,13,6],[9,10],[3,4,5],[10,11],[6,7,50]]

must be converted to :

[[1, 2, 3, 4, 5],[0, 50, 6, 7, 13],[9, 10, 11]] 

So i wrote the following function to do it that works well with a good performance, i use set for its fast complexity for checking membership and also in inner loop i use slicing that compare the first index of main list with other elements in every loop and also note that the list will been decrease after each loop ,as its a recursion inside the loop . :

s=[set(i) for i in g if i]

def find_intersection(m_list):
    for i,v in enumerate(m_list) : 
        for j,k in enumerate(m_list[i+1:],i+1):
           if v & k:
              s[i]=v.union(m_list.pop(j))
              return find_intersection(m_list)
    return m_list

s=[set(i) for i in l if i]
print find_intersection(s)
[set([1, 2, 3, 4, 5]), set([0, 50, 6, 7, 13]), set([9, 10, 11])]

But i think it could be done with another solution maybe with better performance , i thought about collections.deque or maybe with numpy or just modifying my function and make it better ? . if you have any suggestion i would be grateful to hear about !

Community
  • 1
  • 1
Mazdak
  • 105,000
  • 18
  • 159
  • 188

2 Answers2

4

Here is a more efficient algorithm:

  1. For each unique number that is present in at least one of the sublists, let's maintain a list of indices of all sublists that contain this number. This part is O(n * log n) time if we use sorting to find unique numbers or O(n) if we use a hash table, where n is the total number of elements in all sublists.

  2. Let's create a graph where vertices are sublist indices and an edge is present if two indices appear together in at least one list of indices among all numbers. We need create at most O(n) edges(this part is slightly non-trivial: there is no need to create all edges explicitly, we can just add an edge from an element to the next one in each sublist for all unique elements due to transitivity). Here is some pseudo code:

    g = empty graph
    for elem in unique_elements:
        sublist_indices = list of indices of all sublists that contain this element
        for i = 1 ... size(sublist_indices - 1):
            g.add_edge(sublist_indices[i], sublist_indices[i + 1])
    
  3. Now we can find connected components in this graph using depth-first search in linear time(this graph is undirected).

  4. We know which sublists should be merged(they should be merged if and only if they are in the same connected component), so we can easily construct the answer.

The total time complexity is O(n). It is optimal because reading the input already requires O(n) operations.

kraskevich
  • 18,368
  • 4
  • 33
  • 45
  • This is not bad but i think you didn't understand my code well , i use sets for check membership because its `O(1)` and also while i use 2 for its not `O(n^2)` be cause the lenght of inner list and also the main list will been decrease after any loop. – Mazdak Jan 08 '15 at 11:38
  • @Kasra You code is at least `O(n^2)`(maybe even more). For example, if all sublists contain exactly one element and only the last two intersect, n * (n - 1) / 2 iterations will be done. – kraskevich Jan 08 '15 at 11:52
  • you make me in doubt . a bit . – Mazdak Jan 08 '15 at 11:58
  • @Kasra Try this testcase: [[1], [2], ..., [n - 2], [n], [n]]. There are no doubts that you solution makes `O(n ^ 2)` operations on it. – kraskevich Jan 08 '15 at 11:59
  • plus 1 . i must say that i knew this but i had a miss understanding . so this is in worst case ! as your solution is a good solution algorithmical , i need to wait for a pythonic suggestion – Mazdak Jan 08 '15 at 12:02
0
l=[[1,2,3],[0,13,6],[9,10],[3,4,5],[10,11],[6,7,50]]

temp = []

result = []

for i in range(len(l)):

    for j in range(i + 1, len(l)):
        if set(l[i]).intersection(l[j]):
            temp.append(l[i] + l[j])
            result.append(list(set(temp[i])))
print result
sebix
  • 2,943
  • 2
  • 28
  • 43
hariK
  • 2,722
  • 13
  • 18
  • I would be helpful if you would explain what and why the code is appropriate to solve the OP's question. – sebix Jan 08 '15 at 18:51