0

I have a list of tuples:

lst = [('a','b'), ('c', 'b'), ('a', 'd'), ('e','f'), ('a', 'b')]

I want the following output list:

output = [('a','b'), ('e','f')]

i.e I want to compare the elements of first tuple with remaining tuples and remove the tuple which contains either one or more duplicate elements.

My attempt:

I was thinking of using for loops, but that wont be feasible once i have very large list. I browsed through following posts but could not get the right solution:

Removing duplicates members from a list of tuples How do you remove duplicates from a list in whilst preserving order?

If somebody could guide me the right direction, it will be very helpful. Thanks!

Community
  • 1
  • 1
Prakhar Mehrotra
  • 1,215
  • 4
  • 16
  • 21
  • 2
    I don't really understand the rule based on which you select the output. – millimoose Feb 05 '13 at 00:22
  • All you need to do is implement the same idea as you would do for a list of integers, but just change the comparison function. So you would say `('a','b')` and `('a','d')` are equivalent and therefore drop the second one. – Sudipta Chatterjee Feb 05 '13 at 00:24
  • @millimoose If element `a` or element `b` match an earlier element then remove it, so `('c', 'b')` is a duplicate of `('a', 'b')`, and `('a', 'd')` is a duplicate of `('a', 'b')`. – Ian Atkin Feb 05 '13 at 00:24
  • I think a for loop over the elements is really your only option unless you have more information about the lists elements. – Fantastic Mr Fox Feb 05 '13 at 00:24
  • @GordonFreeman So they're less tuples, and more sets of characters then? (A tuple is usually considered a single value, not a collection.) – millimoose Feb 05 '13 at 00:24
  • @Ben List comprehension would be faster. – Ian Atkin Feb 05 '13 at 00:25
  • you have to use some type of loops. if you need better performance you need to start with a better datastructure – John La Rooy Feb 05 '13 at 00:29
  • Also, if the first element of the list is "special", it's a little confusing to have it be part of the list. Would be clearer if it was a separate variable. – millimoose Feb 05 '13 at 00:29
  • Is the position of the 'a' or 'b' important? eg to you remove tuples with the fist item being 'b'? – John La Rooy Feb 05 '13 at 00:31

2 Answers2

6

Assuming that you want "duplicates" of all elements to be suppressed, and not just the first one, you could use:

lst = [('a','b'), ('c', 'b'), ('a', 'd'), ('e','f'), ('a', 'b')]

def merge(x):
    s = set()
    for i in x:
        if not s.intersection(i):
            yield i
            s.update(i)

gives

>>> list(merge(lst))
[('a', 'b'), ('e', 'f')]
>>> list(merge([('a', 'b'), ('c', 'd'), ('c', 'e')]))
[('a', 'b'), ('c', 'd')]
>>> list(merge([('a', 'b'), ('a', 'c'), ('c', 'd')]))
[('a', 'b'), ('c', 'd')]
DSM
  • 342,061
  • 65
  • 592
  • 494
4

Sets should help:

>>> s = map(set, lst)
>>> first = s[0]
>>> [first] + [i for i in s if not i & first]
[set(['a', 'b']), set(['e', 'f'])]

Or with ifilterfalse:

>>> from itertools import ifilterfalse
>>> s = map(set, lst)
>>> [first] + list(ifilterfalse(first.intersection, s))
[set(['a', 'b']), set(['e', 'f'])]
Blender
  • 289,723
  • 53
  • 439
  • 496
  • 1
    Thanks a lot!. It works. Just a quick question: Do you see any performance issue (for eg: speed) if i were to use your first method i.e. [i for i in s if not i & first] instead of ifilerfalse? My list is big. Thanks! – Prakhar Mehrotra Feb 05 '13 at 02:09