13

Can anyone suggest a good solution to remove duplicates from nested lists if wanting to evaluate duplicates based on first element of each nested list?

The main list looks like this:

L = [['14', '65', 76], ['2', '5', 6], ['7', '12', 33], ['14', '22', 46]]

If there is another list with the same element at first position [k][0] that had already occurred, then I'd like to remove that list and get this result:

L = [['14', '65', 76], ['2', '5', 6], ['7', '12', 33]]

Can you suggest an algorithm to achieve this goal?

p.campbell
  • 98,673
  • 67
  • 256
  • 322
elfuego1
  • 10,318
  • 9
  • 28
  • 24

6 Answers6

32

Do you care about preserving order / which duplicate is removed? If not, then:

dict((x[0], x) for x in L).values()

will do it. If you want to preserve order, and want to keep the first one you find then:

def unique_items(L):
    found = set()
    for item in L:
        if item[0] not in found:
            yield item
            found.add(item[0])

print list(unique_items(L))
Brian
  • 116,865
  • 28
  • 107
  • 112
  • your conversion to a dict was so much more elegant than mind that I stole it :) – Jiaaro Jul 17 '09 at 14:02
  • Doesn't the first one also preserve order because dicts preserve order since Python 3.7 and the keys are inserted in the order that the comprehension produces them? – xuiqzy Oct 01 '20 at 13:49
4

use a dict instead like so:

L = {'14': ['65', 76], '2': ['5', 6], '7': ['12', 33]}
L['14'] = ['22', 46]

if you are receiving the first list from some external source, convert it like so:

L = [['14', '65', 76], ['2', '5', 6], ['7', '12', 33], ['14', '22', 46]]
L_dict = dict((x[0], x[1:]) for x in L)
Jiaaro
  • 74,485
  • 42
  • 169
  • 190
2

Use Pandas :

import pandas as pd

L = [['14', '65', 76], ['2', '5', 6], ['7', '12', 33], ['14', '22', 46],['7','a','b']]

df = pd.DataFrame(L)
df = df.drop_duplicates()

L_no_duplicates = df.values.tolist()

If you want to drop duplicates in specific columns only use instead:

df = df.drop_duplicates([1,2])
Rupert Schiessl
  • 799
  • 6
  • 11
0

i am not sure what you meant by "another list", so i assume you are saying those lists inside L

a=[]
L = [['14', '65', 76], ['2', '5', 6], ['7', '12', 33], ['14', '22', 46],['7','a','b']]
for item in L:
    if not item[0] in a:
        a.append(item[0])
        print item
ghostdog74
  • 327,991
  • 56
  • 259
  • 343
  • 1
    This would be more efficient if you used a set for 'a' - you're O(N^2) using a list like that, and amortised O(N) using a set. – RichieHindle Jul 17 '09 at 13:58
  • that has not come to mind, thanks for the info. nevertheless, that code works in older Python version that doesn't come with set. ;) – ghostdog74 Jul 17 '09 at 14:14
0

If the order does not matter, code below

print [ [k] + v for (k, v) in dict( [ [a[0], a[1:]] for a in reversed(L) ] ).items() ]

gives

[['2', '5', '6'], ['14', '65', '76'], ['7', '12', '33']]

Jinuk Kim
  • 765
  • 5
  • 5
0
def Remove(duplicate):
    final_list = []
    for num in duplicate:
        if num not in final_list:
            final_list.append(num)
    return final_list

duplicate = [2, 4, 10, 20, 5, 2, 20, 4]
print(Remove(duplicate))
Eric Aya
  • 69,473
  • 35
  • 181
  • 253
  • 1
    Plase provide some comments about your code and changes you made with the original code. – SLDem Mar 28 '23 at 12:57