Removing duplicates from tuples within a list

Question

I have a list of tuples:

lst = [('a','b'), ('c', 'b'), ('a', 'd'), ('e','f'), ('a', 'b')]

I want the following output list:

output = [('a','b'), ('e','f')]

i.e I want to compare the elements of first tuple with remaining tuples and remove the tuple which contains either one or more duplicate elements.

My attempt:

I was thinking of using for loops, but that wont be feasible once i have very large list. I browsed through following posts but could not get the right solution:

Removing duplicates members from a list of tuples How do you remove duplicates from a list in whilst preserving order?

If somebody could guide me the right direction, it will be very helpful. Thanks!

I don't really understand the rule based on which you select the output. — millimoose, Feb 05 '13 at 00:22
All you need to do is implement the same idea as you would do for a list of integers, but just change the comparison function. So you would say `('a','b')` and `('a','d')` are equivalent and therefore drop the second one. — Sudipta Chatterjee, Feb 05 '13 at 00:24
@millimoose If element `a` or element `b` match an earlier element then remove it, so `('c', 'b')` is a duplicate of `('a', 'b')`, and `('a', 'd')` is a duplicate of `('a', 'b')`. — Ian Atkin, Feb 05 '13 at 00:24
I think a for loop over the elements is really your only option unless you have more information about the lists elements. — Fantastic Mr Fox, Feb 05 '13 at 00:24
@GordonFreeman So they're less tuples, and more sets of characters then? (A tuple is usually considered a single value, not a collection.) — millimoose, Feb 05 '13 at 00:24
you have to use some type of loops. if you need better performance you need to start with a better datastructure — John La Rooy, Feb 05 '13 at 00:29
Also, if the first element of the list is "special", it's a little confusing to have it be part of the list. Would be clearer if it was a separate variable. — millimoose, Feb 05 '13 at 00:29
Is the position of the 'a' or 'b' important? eg to you remove tuples with the fist item being 'b'? — John La Rooy, Feb 05 '13 at 00:31

DSM · Accepted Answer · 2013-02-05T00:43:26.273

Assuming that you want "duplicates" of all elements to be suppressed, and not just the first one, you could use:

lst = [('a','b'), ('c', 'b'), ('a', 'd'), ('e','f'), ('a', 'b')]

def merge(x):
    s = set()
    for i in x:
        if not s.intersection(i):
            yield i
            s.update(i)

gives

>>> list(merge(lst))
[('a', 'b'), ('e', 'f')]
>>> list(merge([('a', 'b'), ('c', 'd'), ('c', 'e')]))
[('a', 'b'), ('c', 'd')]
>>> list(merge([('a', 'b'), ('a', 'c'), ('c', 'd')]))
[('a', 'b'), ('c', 'd')]

Blender · Answer 2 · 2013-02-05T00:31:05.590

4

Sets should help:

>>> s = map(set, lst)
>>> first = s[0]
>>> [first] + [i for i in s if not i & first]
[set(['a', 'b']), set(['e', 'f'])]

Or with ifilterfalse:

>>> from itertools import ifilterfalse
>>> s = map(set, lst)
>>> [first] + list(ifilterfalse(first.intersection, s))
[set(['a', 'b']), set(['e', 'f'])]

edited Feb 05 '13 at 00:31

answered Feb 05 '13 at 00:25

Blender

289,723
53
439
496

1

Thanks a lot!. It works. Just a quick question: Do you see any performance issue (for eg: speed) if i were to use your first method i.e. [i for i in s if not i & first] instead of ifilerfalse? My list is big. Thanks! – Prakhar Mehrotra Feb 05 '13 at 02:09

Removing duplicates from tuples within a list

2 Answers2