Remove Items duplicate items from an array based on first entry

Question

I know it is easy to remove duplicate items from a list such as:

lst = ['a' , 'b' , 'c' , 'c' , 'd' , 'd' ]

by using the method:

lst = list(dict.fromkeys(lst))
#output
lst = ['a' , 'b' , 'c' , 'd']

However this method does not work if the list is made up of 2 element lists like this:

lst = [['a','1'],['b','2'],['b','1'],['c','3'],['c','2']]

I would like to remove all the entries where the first element is duplicated, leaving behind the first instance of each regardless of the second element. So the output should be:

lst = [['a','1'],['b','2'],['c','3']]

Does this answer your question? [Removing duplicates in lists](https://stackoverflow.com/questions/7961363/removing-duplicates-in-lists) — Sayse, Nov 29 '19 at 15:27
Do you want to remove duplicates from whole list or just consecutive duplicates, eg. for `lst=[['a',1], ['b',2], ['a',2], ['a',3]]` the result would be `[['a',1], ['b',2], ['a',2]]`? — Andrej Kesely, Nov 29 '19 at 15:27
I want to remove any entry where the first element has already appeared in the list. So for your example the output should be [['a','1'],['b','2']] — Maurio, Nov 30 '19 at 12:17

score 4 · Accepted Answer · answered Nov 29 '19 at 15:40

4

You can use itertools.groupby:

import itertools as it

# use the lambda to group by the first index
# next(g) returns the first instance of the group

[next(g) for k, g in it.groupby(lst, key=lambda x: x[0])]

Result:

[['a', '1'], ['b', '2'], ['c', '3']]

answered Nov 29 '19 at 15:40

r.ook

13,466
2
22
39

It's worth noting this requires `lst` to be sorted. – snakecharmerb Nov 30 '19 at 07:53

score 0 · Answer 2 · answered Nov 29 '19 at 15:26

0

You can try this:

>>> lst = [['a','1'],['b','2'],['b','1'],['c','3'],['c','2']]
>>> dict(lst)
{'a': '1', 'b': '1', 'c': '2'}

>>> [k for k,_ in dict(lst).items()]
['a', 'b', 'c']

>>> [[k,v] for k,v in dict(lst).items()]
[['a', '1'], ['b', '1'], ['c', '2']]

answered Nov 29 '19 at 15:26

abhilb

5,639
2
20
26

@GiacomoAlzetta Are you sure? The expected output clearly shows that the first pair should be kept. – Riccardo Bucco Nov 29 '19 at 15:38
1

@RiccardoBucco Oh my bad. Who writes an example with increasing sequences and then just swap the last one just to trick others. I'll delete my comment. – Giacomo Alzetta Nov 29 '19 at 15:41

score 0 · Answer 3 · answered Nov 29 '19 at 15:39

0

lst = [['a', '1'], ['b', '2'], ['b', '1'], ['c', '3'], ['c', '2']]

d = {}
for tupl in lst:
    first = tupl[0]
    if first not in d:
        d[first] = tupl

print(list(d.values()))

outputs:

[['a', '1'], ['b', '2'], ['c', '3']]

answered Nov 29 '19 at 15:39

nonamer92

1,887
1
13
24

Remove Items duplicate items from an array based on first entry

3 Answers3