how to remove partial duplicates in a list of lists

Question

I have a list of lists where it looks like

[[1,'a',2],[1,'b',2],[1,'a',3]]

I want to remove the item from the list if the second element in the list of lists are the same (e.g. they are both a)

I want to create output that looks like:

[[1,'a',2],[1,'b',2]]

where it grabs the first one in the list of the duplicates.

Jean-François Fabre · Accepted Answer · 2018-10-11T19:47:02.040

3

that's a variant of How do you remove duplicates from a list whilst preserving order?.

You can use a marker set to track the already appended sublists since strings are immutable so hashable & storable in a set:

lst = [[1,'a',2],[1,'b',2],[1,'a',3]]

marker_set = set()

result = []

for sublist in lst:
    second_elt = sublist[1]
    if second_elt not in marker_set:
        result.append(sublist)
        marker_set.add(second_elt)

print(result)

prints:

[[1, 'a', 2], [1, 'b', 2]]

(using a marker set and not a list allows an average O(1) lookup instead of O(N))

edited Oct 11 '18 at 19:47

answered Oct 11 '18 at 19:33

Jean-François Fabre

137,073
23
153
219

what are the benefits of using `set` here over a plain list – vash_the_stampede Oct 11 '18 at 19:46
Of course, I figured that, I should have known it was speed related didn't know if there were any other reasons, great work! – vash_the_stampede Oct 11 '18 at 19:47
thanks. Actually this question is borderline duplicate, but I figured that closing it as a duplicate wouldn't be enough for OP & others to solve that particular one. – Jean-François Fabre Oct 11 '18 at 19:47
if you have an exact duplicate you can share the link – Jean-François Fabre Oct 11 '18 at 19:51
1

actually this is a slight off variant of results I found, if I come across one , will link – vash_the_stampede Oct 11 '18 at 19:56

user3483203 · Answer 2 · 2018-10-11T19:47:36.270

2

You can use a dictionary where the second element is the key, on the reverse of the list, to drop duplicates:

dct = {j: (i, k) for i, j, k in reversed(L)}

{'a': (1, 2), 'b': (1, 2)}

Getting the result back as a list:

[[i, j, k] for j, (i, k) in dct.items()]

[[1, 'a', 2], [1, 'b', 2]]

While this solution will always keep the first occurence of a duplicate, the relative order of elements is not guaranteed in the final result.

edited Oct 11 '18 at 19:47

answered Oct 11 '18 at 19:33

user3483203

50,081
9
65
94

Honest question, is slicing that way better than `reversed`? – jedwards Oct 11 '18 at 19:33
1

Not really, but it means my one liners can be shorter :P – user3483203 Oct 11 '18 at 19:34
2

you can use `reversed` in your one-liner. It's better because it doesn't create a new list. – Jean-François Fabre Oct 11 '18 at 19:34
1

@Jean-FrançoisFabre but that's 4 more characters, so clearly much more inefficient ;) – user3483203 Oct 11 '18 at 19:35
2

Worth noting that this approach will work as expected only in Python 3.7 and above. Otherwise, `dct` may be in any arbitrary order. – DeepSpace Oct 11 '18 at 19:37
@DeepSpace I note in the answer the relative order may be changed. However, the correct duplicates will always be removed, because the input is a list that is iterated over – user3483203 Oct 11 '18 at 19:38
1

I just noticed he says the second element, not the first two, I need to update my answer anyways – user3483203 Oct 11 '18 at 19:45
dicts are ordered by implementation in python 3.6, and it's guaranteed in python 3.7. It was a long wait :) – Jean-François Fabre Oct 11 '18 at 19:50

vash_the_stampede · Answer 3 · 2018-10-11T20:08:58.083

1

lst = [[1,'a',2],[1,'b',2],[1,'a',3]]
res = []
for i in lst:
    if not any(i[1] in j for j in res):
        res.append(i)

print(res)
# [[1, 'a', 2], [1, 'b', 2]]

edited Oct 11 '18 at 20:08

answered Oct 11 '18 at 19:37

vash_the_stampede

4,590
1
8
20

don't sort just to use `groupby`, that's inefficient – Jean-François Fabre Oct 11 '18 at 19:38
1

@Jean-FrançoisFabre should look for alternatives that dont involve rearranging original list then? – vash_the_stampede Oct 11 '18 at 19:38
yes; `groupby` is really neat when the items to group are contiguous. Else `sort` kills the fun, introducting useless `O(N*log(N))` complexity when `O(N)` does it – Jean-François Fabre Oct 11 '18 at 19:40
1

@Jean-FrançoisFabre thank you , so reserve groupby for when we don't have to rearrange the list, got it – vash_the_stampede Oct 11 '18 at 19:41
@Jean-FrançoisFabre better? – vash_the_stampede Oct 11 '18 at 20:09
1

this solution will be slower but would work on non-mutable items so yes – Jean-François Fabre Oct 11 '18 at 20:32

how to remove partial duplicates in a list of lists

3 Answers3