0

I am trying to remove duplicates from a nested list that contains a nested list combination and float value:

list = [
[['Milk', 'Bread', 'Diaper'], 40.0], 
[['Milk', 'Diaper', 'Bread'], 40.0], 
[['Milk', 'Diaper', 'Beer'], 40.0], 
[['Milk', 'Beer', 'Diaper'], 40.0], 
[['Diaper', 'Bread', 'Milk'], 40.0], 
[['Diaper', 'Bread', 'Beer'], 40.0], 
[['Diaper', 'Milk', 'Bread'], 40.0], 
[['Diaper', 'Milk', 'Beer'], 40.0], 
[['Diaper', 'Beer', 'Bread'], 40.0], 
[['Diaper', 'Beer', 'Milk'], 40.0], 
[['Beer', 'Bread', 'Diaper'], 40.0], 
[['Beer', 'Milk', 'Diaper'], 40.0], 
[['Beer', 'Diaper', 'Bread'], 40.0], 
[['Beer', 'Diaper', 'Milk'], 40.0]
]

I need to be able to remove items from the outer list based on removing the duplicates of the nested list regardless of the order of items in the nested list.

The output needs to be one instance of every combination:

updated_list = [
[['Milk', 'Bread', 'Diaper'], 40.0],
[['Diaper', 'Beer', 'Bread'], 40.0], 
[['Beer', 'Diaper', 'Milk'], 40.0]
]

Thank you in advance.

Nick K
  • 35
  • 1
  • 8

2 Answers2

1

You can use Python's set and frozenset for this:

seen_it = set()
updated_list = []
for line in list:
    key = frozenset(line[0])
    if key not in seen_it:
        seen_it.add(key)
        updated_list.append(line)

Notice that seen_it keeps track of which sublists we've seen before to avoid adding to the unique lines in updated_list.

Also notice that keys in seen_it are frozenset type, which ignores order like set, but is immutable so it can go within another set.

Pi Marillion
  • 4,465
  • 1
  • 19
  • 20
0

You could use a set or, if the sublists can have duplicate elements, a collections.Counter.

A set is an unordered collection of unique elements, so {1, 2, 3} is equivalent to {3, 2, 1}. If you pass a list to set, it will create a set with the list's elements. However, if the same value was in the list twice, that information is lost in the set.

# These are both the set {1, 2, 3}
s1 = {3, 1, 2, 1}
s2 = set([2, 1, 3, 3, 2])
assert s1 == s2 # True

If you might have duplicates in the lists, the data type you need is a multiset. Unfortunately, Python does not provide a multiset. However, Counter works for many of a multiset's use cases, including comparison.

from collections import Counter
# These counters have different numbers of each value
c1 = Counter(['a', 'b', 'a', 'c'])
c2 = Counter(['c', 'b', 'b', 'a', 'c'])
assert c1 == c2 # False

As for actually removing the duplicates, the Itertools Recipe unique_everseen should serve your needs. If you don't care about duplicates in the other list elements, you can use operator.itemgetter as the key function.


Also, you shouldn't use "list" as a variable name; it's bad to shadow the builtins.

jirassimok
  • 3,850
  • 2
  • 14
  • 23