How to remove list items depending on predecessor in python

Question

Given a Python list, I want to remove consecutive 'duplicates'. The duplicate value however is a attribute of the list item (In this example, the tuple's first element).

Input:

[(1, 'a'), (2, 'b'), (2, 'b'), (2, 'c'), (3, 'd'), (2, 'e')]

Desired Output:

[(1, 'a'), (2, 'b'), (3, 'd'), (2, 'e')]

Cannot use set or dict, because order is important.

Cannot use list comprehension [x for x in somelist if not determine(x)], because the check depends on predecessor.

What I want is something like:

mylist = [...]

for i in range(len(mylist)):
    if mylist[i-1].attr == mylist[i].attr:
        mylist.remove(i)

What is the preferred way to solve this in Python?

[Python 3.6 maintain dict order](https://stackoverflow.com/a/39537308/5168011) — Guy, Apr 17 '19 at 08:59
Possible duplicate of [Dictionaries: How to keep keys/values in same order as declared?](https://stackoverflow.com/questions/1867861/dictionaries-how-to-keep-keys-values-in-same-order-as-declared) — Guy, Apr 17 '19 at 08:59
Are you only concerned with *consecutive* duplicates? That is, if the last item of the list was also `1, 'a'`, would that be a duplicate of the first? — Daniel Roseman, Apr 17 '19 at 09:00
What should be the result for `[(1, 'a'), (2, 'a'), (1, 'a')]` ? Should it be `[(1, 'a'), (2, 'a'), (1, 'a')]` or `[(1, 'a'), (2, 'a')]` ? — Cid, Apr 17 '19 at 09:01
Yes, it's about consecutive duplicates. The output for `[(1, 'a'), (2, 'a'), (1, 'a')]` should be `[(1, 'a'), (2, 'a'), (1, 'a')]`. — Sparkofska, Apr 17 '19 at 09:02
@gmds , yes, `(1, 'a')` and `(1, 'b')` are considered equal duplicates in this case. — Sparkofska, Apr 17 '19 at 09:19
@Sparkofska In that case, you will need to use my solution that specifies a key function to compare only on the first element of the `tuple`. — gmds, Apr 17 '19 at 09:20
@gmds You're right. Accepted and edited example accordingly. — Sparkofska, Apr 17 '19 at 09:27

gmds · Accepted Answer · 2019-04-17T09:35:57.657

17

You can use itertools.groupby (demonstration with more data):

from itertools import groupby
from operator import itemgetter

data = [(1, 'a'), (2, 'a'), (2, 'b'), (3, 'a'), (4, 'a'), (2, 'a'), (2, 'a'), (3, 'a'), (3, 'a')]

[next(group) for key, group in groupby(data, key=itemgetter(0))]

Output:

[(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a'), (2, 'a'), (3, 'a')]

For completeness, an iterative approach based on other answers:

result = []

for first, second in zip(data, data[1:]):
    if first[0] != second[0]:
        result.append(first)

result

Output:

[(1, 'a'), (2, 'b'), (3, 'a'), (4, 'a'), (2, 'a')]

Note that this keeps the last duplicate, instead of the first.

edited Apr 17 '19 at 09:35

answered Apr 17 '19 at 09:04

gmds

19,325
4
32
58

You don't need any key parameter, just take the key of each group – yatu Apr 17 '19 at 09:05
@yatu The question says "the duplicate value is an attribute of the `list`", which means that that wouldn't work if `(2, 'a')` and `(2, 'b')` are considered equal. – gmds Apr 17 '19 at 09:06
I see yes in that case indeed it makes sense @gmds. Hard to tell however with this example. IMO if tht was what OP meant a more general example would make more sense – yatu Apr 17 '19 at 09:09

yatu · Answer 2 · 2019-04-17T09:08:13.597

12

In order to remove consecutive duplicates, you could use itertools.groupby:

l = [(1, 'a'), (2, 'a'), (2, 'a'), (3, 'a'), (4, 'a')]
from itertools import groupby
[tuple(k) for k, _ in groupby(l)]
# [(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a')]

edited Apr 17 '19 at 09:08

answered Apr 17 '19 at 08:59

yatu

86,083
12
84
139

Henry Yik · Answer 3 · 2019-04-17T09:21:53.490

If I am not mistaken, you only need to lookup the last value.

test = [(1, 'a'), (2, 'a'), (2, 'a'), (3, 'a'), (4, 'a'),(3, 'a'),(4,"a"),(4,"a")]

result = []

for i in test:
    if result and i[0] == result[-1][0]: #edited since OP considers (1,"a") and (1,"b") as duplicate
    #if result and i == result[-1]:
        continue
    else:
        result.append(i)

print (result)

Output:

[(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a'), (3, 'a'), (4, 'a')]

score 2 · Answer 4 · answered Apr 17 '19 at 18:35

If you just want to stick to list comprehension, you can use something like this:

>>> li = [(1, 'a'), (2, 'a'), (2, 'a'), (3, 'a'), (2, 'a')]
>>> [li[i] for i in range(len(li)) if not i or li[i] != li[i-1]]
[(1, 'a'), (2, 'a'), (3, 'a'), (2, 'a')]

Please not that not i is the pythonic way of writing i == 0.

score 2 · Answer 5 · answered Apr 18 '19 at 05:45

2

You could also use enumerate and a list comprehension:

>>> data = [(1, 'a'), (2, 'b'), (2, 'b'), (2, 'c'), (3, 'd'), (2, 'e')]
>>> [v for ix, v in enumerate(data) if not ix or v[0] != data[ix-1][0]]
[(1, 'a'), (2, 'b'), (3, 'd'), (2, 'e')]

answered Apr 18 '19 at 05:45

Cloudomation

1,597
1
6
15

Nice one, because no need for any `import`s. Also `v[0]` can be replaced by any `v.get_attribute()`, which makes it quite universal. – Sparkofska Apr 18 '19 at 05:59

score 1 · Answer 6 · answered Apr 17 '19 at 09:16

I'd change Henry Yik's proposal a little bit, making it a bit simpler. Not sure if I am missing something.

inputList = [(1, 'a'), (2, 'a'), (2, 'a'), (3, 'a'), (2, 'a')]
outputList = []
lastItem = None

for item in inputList:
    if not item == lastItem:
        outputList.append(item)
        lastItem = item
print(outputList)

score 1 · Answer 7 · answered Apr 17 '19 at 15:23

You can easily zip the list with itself. Every element, except the first one, is zipped with its predecessor:

>>> L = [(1, 'a'), (2, 'b'), (2, 'b'), (2, 'c'), (3, 'd'), (2, 'e')]
>>> list(zip(L[1:], L))
[((2, 'b'), (1, 'a')), ((2, 'b'), (2, 'b')), ((2, 'c'), (2, 'b')), ((3, 'd'), (2, 'c')), ((2, 'e'), (3, 'd'))]

The first element is always part of the result, and then you filter the pairs on the condition and return the first element:

>>> [L[0]]+[e for e, f in zip(L[1:], L) if e[0]!=f[0]]
[(1, 'a'), (2, 'b'), (3, 'd'), (2, 'e')]

score 1 · Answer 8 · answered Apr 24 '19 at 20:29

It's somewhat overkill but you can use 'reduce',too:

from functools import reduce
data=[(1, 'a'), (2, 'b'), (2, 'b'), (2, 'c'), (3, 'd'), (2, 'e')]                                                    
reduce(lambda rslt,t: rslt if rslt[-1][0]==t[0] else rslt+[t], data, [data[0]])                                      
[(1, 'a'), (2, 'b'), (3, 'd'), (2, 'e')]

How to remove list items depending on predecessor in python

8 Answers8