How can I remove duplicate tuples from a list based on index value of tuple while maintaining the order of tuple?

Question

I want to remove those tuples which had same values at index 0 except the first occurance. I looked at other similar questions but did not get a particular answer I am looking for. Can somebody please help me? Below is what I tried.

from itertools import groupby
import random
Newlist = []

abc = [(1,2,3), (2,3,4), (1,0,3),(0,2,0), (2,4,5),(5,4,3), (0,4,1)]

Newlist = [random.choice(tuple(g)) for _, g in groupby(abc, key=lambda x: x[0])]
print Newlist

my expected output : [(1,2,3), (2,3,4), (0,2,0), (5,4,3)]

With the `random.choice`, how can your attempt guarantee that it only keeps the first occurence? — Graipher, Apr 17 '18 at 14:20

score 5 · Accepted Answer · answered Apr 17 '18 at 14:19

5

A simple way is to loop over the list and keep track of which elements you've already found:

abc = [(1,2,3), (2,3,4), (1,0,3),(0,2,0), (2,4,5),(5,4,3), (0,4,1)]
found = set()
NewList = []
for a in abc:
    if a[0] not in found:
        NewList.append(a)
    found.add(a[0])
print(NewList)
#[(1, 2, 3), (2, 3, 4), (0, 2, 0), (5, 4, 3)]

found is a set. At each iteration we check if the first element in the tuple is already in found. If not, we append the whole tuple to NewList. At the end of each iteration we add the first element of the tuple to found.

answered Apr 17 '18 at 14:19

pault

41,343
15
107
149

2

One caveat is that this only works if the first element of the tuple is hashable (which numbers, like in the given example, are, of course). – Graipher Apr 17 '18 at 14:22
A slight improvement: only add `a[0]` to `found` if it is not already in there (i.e. increase the indentation by one level for `found.add(a[0])`). – Jack Taylor Apr 17 '18 at 14:26
it will be better to `continue` if `a[0]` in `found` & append/add otherwise (no indentation problem as @JackTaylor stated) – Azat Ibrakov Apr 17 '18 at 14:30
Thank you @pault: I am using this solution on my code. – A.S Apr 17 '18 at 14:43

jdehesa · Answer 2 · 2018-04-17T14:26:02.573

3

A better alternative using OrderedDict:

from collections import OrderedDict

abc = [(1,2,3), (2,3,4), (1,0,3), (0,2,0), (2,4,5),(5,4,3), (0,4,1)]
d = OrderedDict()
for t in abc:
    d.setdefault(t[0], t)
abc_unique = list(d.values())
print(abc_unique)

Output:

[(1, 2, 3), (2, 3, 4), (0, 2, 0), (5, 4, 3)]

Simple although not very efficient:

abc = [(1,2,3), (2,3,4), (1,0,3), (0,2,0), (2,4,5),(5,4,3), (0,4,1)]
abc_unique = [t for i, t in enumerate(abc) if not any(t[0] == p[0] for p in abc[:i])]
print(abc_unique)

Output:

[(1, 2, 3), (2, 3, 4), (0, 2, 0), (5, 4, 3)]

edited Apr 17 '18 at 14:26

answered Apr 17 '18 at 14:19

jdehesa

58,456
7
77
121

@ jdehesa: Thank you, it works. – A.S Apr 17 '18 at 14:43

Graipher · Answer 3 · 2018-04-18T07:36:38.083

The itertools recipes (Python 2: itertools recipes, but basically no difference in this case) contains a recipe for this, which is a bit more general than the implementation by @pault. It also uses a set:

Python 2:

from itertools import ifilterfalse as filterfalse

Python 3:

from itertools import filterfalse

def unique_everseen(iterable, key=None):
    "List unique elements, preserving order. Remember all elements ever seen."
    # unique_everseen('AAAABBBCCDAABBB') --> A B C D
    # unique_everseen('ABBCcAD', str.lower) --> A B C D
    seen = set()
    seen_add = seen.add
    if key is None:
        for element in filterfalse(seen.__contains__, iterable):
            seen_add(element)
            yield element
    else:
        for element in iterable:
            k = key(element)
            if k not in seen:
                seen_add(k)
                yield element

Use it with:

abc = [(1,2,3), (2,3,4), (1,0,3),(0,2,0), (2,4,5),(5,4,3), (0,4,1)]
Newlist = list(unique_everseen(abc, key=lambda x: x[0]))
print Newlist
# [(1, 2, 3), (2, 3, 4), (0, 2, 0), (5, 4, 3)]

This should be slightly faster because of the caching of the set.add method (only really relevant if your abc is large) and should also be more general because it makes the key function a parameter.

Apart from that, the same limitation I already mentioned in a comment applies: this only works if the first element of the tuple is actually hashable (which numbers, like in the given example, are, of course).

@A.S Fixed. In Python 2 it was called `ifilterfalse`. It still worked because in this case the `key` function was defined. — Graipher, Apr 17 '18 at 14:50

cdlane · Answer 4 · 2018-04-17T17:15:44.993

@PatrickHaugh claims:

but the question is explicitly about maintaining the order of the tuples. I don't think there's a solution using groupby

I never miss an opportunity to (ab)use groupby(). Here's my solution sans sorting (once or twice):

from itertools import groupby, chain

abc = [(1, 2, 3), (2, 3, 4), (1, 0, 3), (0, 2, 0), (2, 4, 5), (5, 4, 3), (0, 4, 1)]

Newlist = list((lambda s: chain.from_iterable(g for f, g in groupby(abc, lambda k: s.get(k[0]) != s.setdefault(k[0], True)) if f))({}))

print(Newlist)

OUTPUT

% python3 test.py
[(1, 2, 3), (2, 3, 4), (0, 2, 0), (5, 4, 3)]
%

fferri · Answer 5 · 2018-04-17T14:44:01.663

1

To use groupby correctly, the sequence must be sorted:

>>> [next(g) for k,g in groupby(sorted(abc, key=lambda x:x[0]), key=lambda x:x[0])]
[(0, 2, 0), (1, 2, 3), (2, 3, 4), (5, 4, 3)]

or if you need that very exact order of your example (i.e. maintaining original order):

>>> [t[2:] for t in sorted([next(g) for k,g in groupby(sorted([(t[0], i)+t for i,t in enumerate(abc)]), lambda x:x[0])], key=lambda x:x[1])]
[(1, 2, 3), (2, 3, 4), (0, 2, 0), (5, 4, 3)]

the trick here is to add one field for keeping the original order to restore after the groupby() step.

Edit: even a bit shorter:

>>> [t[1:] for t in sorted([next(g)[1:] for k,g in groupby(sorted([(t[0], i)+t for i,t in enumerate(abc)]), lambda x:x[0])])]
[(1, 2, 3), (2, 3, 4), (0, 2, 0), (5, 4, 3)]

edited Apr 17 '18 at 14:44

answered Apr 17 '18 at 14:17

fferri

18,285
5
46
95

1

That's true, but the question is explicitly about maintaining the order of the tuples. I don't think there's a solution using `groupby` – Patrick Haugh Apr 17 '18 at 14:19
@fferri: I need the order of tuple as shown in expected output. – A.S Apr 17 '18 at 14:27
@A.S I updated my answer – fferri Apr 17 '18 at 14:35
@PatrickHaugh: there's always a solution with any tool =P – fferri Apr 17 '18 at 14:40
@fferri: Thank you for the extra effort. It works now. – A.S Apr 17 '18 at 14:42

How can I remove duplicate tuples from a list based on index value of tuple while maintaining the order of tuple?

5 Answers5