6

I want to remove those tuples which had same values at index 0 except the first occurance. I looked at other similar questions but did not get a particular answer I am looking for. Can somebody please help me? Below is what I tried.

from itertools import groupby
import random
Newlist = []

abc = [(1,2,3), (2,3,4), (1,0,3),(0,2,0), (2,4,5),(5,4,3), (0,4,1)]

Newlist = [random.choice(tuple(g)) for _, g in groupby(abc, key=lambda x: x[0])]
print Newlist

my expected output : [(1,2,3), (2,3,4), (0,2,0), (5,4,3)]

llllllllll
  • 16,169
  • 4
  • 31
  • 54
A.S
  • 305
  • 1
  • 4
  • 20

5 Answers5

5

A simple way is to loop over the list and keep track of which elements you've already found:

abc = [(1,2,3), (2,3,4), (1,0,3),(0,2,0), (2,4,5),(5,4,3), (0,4,1)]
found = set()
NewList = []
for a in abc:
    if a[0] not in found:
        NewList.append(a)
    found.add(a[0])
print(NewList)
#[(1, 2, 3), (2, 3, 4), (0, 2, 0), (5, 4, 3)]

found is a set. At each iteration we check if the first element in the tuple is already in found. If not, we append the whole tuple to NewList. At the end of each iteration we add the first element of the tuple to found.

pault
  • 41,343
  • 15
  • 107
  • 149
  • 2
    One caveat is that this only works if the first element of the tuple is hashable (which numbers, like in the given example, are, of course). – Graipher Apr 17 '18 at 14:22
  • A slight improvement: only add `a[0]` to `found` if it is not already in there (i.e. increase the indentation by one level for `found.add(a[0])`). – Jack Taylor Apr 17 '18 at 14:26
  • it will be better to `continue` if `a[0]` in `found` & append/add otherwise (no indentation problem as @JackTaylor stated) – Azat Ibrakov Apr 17 '18 at 14:30
  • Thank you @pault: I am using this solution on my code. – A.S Apr 17 '18 at 14:43
3

A better alternative using OrderedDict:

from collections import OrderedDict

abc = [(1,2,3), (2,3,4), (1,0,3), (0,2,0), (2,4,5),(5,4,3), (0,4,1)]
d = OrderedDict()
for t in abc:
    d.setdefault(t[0], t)
abc_unique = list(d.values())
print(abc_unique)

Output:

[(1, 2, 3), (2, 3, 4), (0, 2, 0), (5, 4, 3)]

Simple although not very efficient:

abc = [(1,2,3), (2,3,4), (1,0,3), (0,2,0), (2,4,5),(5,4,3), (0,4,1)]
abc_unique = [t for i, t in enumerate(abc) if not any(t[0] == p[0] for p in abc[:i])]
print(abc_unique)

Output:

[(1, 2, 3), (2, 3, 4), (0, 2, 0), (5, 4, 3)]
jdehesa
  • 58,456
  • 7
  • 77
  • 121
3

The itertools recipes (Python 2: itertools recipes, but basically no difference in this case) contains a recipe for this, which is a bit more general than the implementation by @pault. It also uses a set:

Python 2:

from itertools import ifilterfalse as filterfalse

Python 3:

from itertools import filterfalse
def unique_everseen(iterable, key=None):
    "List unique elements, preserving order. Remember all elements ever seen."
    # unique_everseen('AAAABBBCCDAABBB') --> A B C D
    # unique_everseen('ABBCcAD', str.lower) --> A B C D
    seen = set()
    seen_add = seen.add
    if key is None:
        for element in filterfalse(seen.__contains__, iterable):
            seen_add(element)
            yield element
    else:
        for element in iterable:
            k = key(element)
            if k not in seen:
                seen_add(k)
                yield element

Use it with:

abc = [(1,2,3), (2,3,4), (1,0,3),(0,2,0), (2,4,5),(5,4,3), (0,4,1)]
Newlist = list(unique_everseen(abc, key=lambda x: x[0]))
print Newlist
# [(1, 2, 3), (2, 3, 4), (0, 2, 0), (5, 4, 3)]

This should be slightly faster because of the caching of the set.add method (only really relevant if your abc is large) and should also be more general because it makes the key function a parameter.

Apart from that, the same limitation I already mentioned in a comment applies: this only works if the first element of the tuple is actually hashable (which numbers, like in the given example, are, of course).

Graipher
  • 6,891
  • 27
  • 47
  • @A.S Fixed. In Python 2 it was called `ifilterfalse`. It still worked because in this case the `key` function was defined. – Graipher Apr 17 '18 at 14:50
2

@PatrickHaugh claims:

but the question is explicitly about maintaining the order of the tuples. I don't think there's a solution using groupby

I never miss an opportunity to (ab)use groupby(). Here's my solution sans sorting (once or twice):

from itertools import groupby, chain

abc = [(1, 2, 3), (2, 3, 4), (1, 0, 3), (0, 2, 0), (2, 4, 5), (5, 4, 3), (0, 4, 1)]

Newlist = list((lambda s: chain.from_iterable(g for f, g in groupby(abc, lambda k: s.get(k[0]) != s.setdefault(k[0], True)) if f))({}))

print(Newlist)

OUTPUT

% python3 test.py
[(1, 2, 3), (2, 3, 4), (0, 2, 0), (5, 4, 3)]
%
cdlane
  • 40,441
  • 5
  • 32
  • 81
1

To use groupby correctly, the sequence must be sorted:

>>> [next(g) for k,g in groupby(sorted(abc, key=lambda x:x[0]), key=lambda x:x[0])]
[(0, 2, 0), (1, 2, 3), (2, 3, 4), (5, 4, 3)]

or if you need that very exact order of your example (i.e. maintaining original order):

>>> [t[2:] for t in sorted([next(g) for k,g in groupby(sorted([(t[0], i)+t for i,t in enumerate(abc)]), lambda x:x[0])], key=lambda x:x[1])]
[(1, 2, 3), (2, 3, 4), (0, 2, 0), (5, 4, 3)]

the trick here is to add one field for keeping the original order to restore after the groupby() step.

Edit: even a bit shorter:

>>> [t[1:] for t in sorted([next(g)[1:] for k,g in groupby(sorted([(t[0], i)+t for i,t in enumerate(abc)]), lambda x:x[0])])]
[(1, 2, 3), (2, 3, 4), (0, 2, 0), (5, 4, 3)]
fferri
  • 18,285
  • 5
  • 46
  • 95