0

I am creating a list of lists and want to prevent dupes. For example, I have:

mainlist = [[a,b],[c,d],[a,d]]

the next item (list) to be added is [b,a] which is considered a duplicate of [a,b].

UPDATE

mainlist  = [[a,b],[c,d],[a,d]]   
swap = [b,a]   

for item in mainlist:
   if set(item) & set(swap):
     print "match was found", item
   else:
     mainlist.append(swap)

Any suggestions as to how I can test whether the next item to be added is already in the list?

idjaw
  • 25,487
  • 7
  • 64
  • 83
ChrisJ
  • 477
  • 2
  • 15
  • 31
  • 1
    Have you attempted this one yet? Would be cool to see your attempt and help you along with difficulties in your implementation. – idjaw Jun 29 '17 at 21:54
  • 1
    Seems like the order within the inner lists doesn't matter so I suggest using a list of `set`s. Then you can simply check `x in list`. – a_guest Jun 29 '17 at 21:56
  • 2
    Even better: a set of frozen-sets. If order isn't important – juanpa.arrivillaga Jun 29 '17 at 21:58
  • Just an FYI: Python has `frozenset`, which can be an element of a set, and `set`, which allows you to test for membership. Maybe a good idea. – Alex Huszagh Jun 29 '17 at 21:58
  • @a_guest I'm not sure that's a fair assumption to make without OP's clarification. For example, we may not want [a, a, b] and [b, a] to be considered the same. – cs95 Jun 29 '17 at 21:58
  • @Coldspeed damn. No frozen-counter objects... But maybe the OP can clarify on the requirements – juanpa.arrivillaga Jun 29 '17 at 21:59
  • @juanpa.arrivillaga Man, what's with OPs and their lack of clarifications. Sheesh. – cs95 Jun 29 '17 at 22:00
  • Thanks for the comments.... The each inner lists has 2 elements. – ChrisJ Jun 29 '17 at 22:01
  • does the order matter? – juanpa.arrivillaga Jun 29 '17 at 22:02
  • 1
    @ChrisJ Can both elements be the same? – cs95 Jun 29 '17 at 22:02
  • 1
    @juanpa.arrivillaga, order doesn't matter, obviously, since `{a, b} == {b, a}`. Ideally, use a set of frozensets. We don't need multiset behavior. – Alex Huszagh Jun 29 '17 at 22:03
  • @AlexanderHuszagh That was exactly my thinking as well. If order _did_ matter, then `[a, b]` shouldn't be a duplicate of `[b, a]`. – Christian Dean Jun 29 '17 at 22:04
  • @AlexanderHuszagh I meant the order of the frozensets, i.e., should the outer container be ordered (e.g. a list). As for multisets, that was in case `{a, a, b}` needs to be distinct from `{a, b}`, but the OP already clarified, itis always pairs. – juanpa.arrivillaga Jun 29 '17 at 22:05
  • @coldspeed No the elements in the inner lists can not be the same – ChrisJ Jun 29 '17 at 22:10
  • @ChrisJ Your existing code is flawed because even if one element from both sets intersect, you'd still declare a match. You'd essentially want the intersection to be size 2. You can check that with the `len` function. – cs95 Jun 29 '17 at 22:15
  • #coldspeed I realized that but wasn't sure how to fix it... Not sure how to use the len function to check if the 2 elements in the lists are being compared – ChrisJ Jun 29 '17 at 22:23
  • Possible duplicate of [Python: removing duplicates from a list of lists](https://stackoverflow.com/questions/2213923/python-removing-duplicates-from-a-list-of-lists) – TalkLittle Jun 30 '17 at 13:45

2 Answers2

3

Here's an approach using frozensets within a set to check for duplicates. It's a bit ugly since I'm invoking a function that works with global variables.

def add_to_mainlist(new_list):
    if frozenset(new_list) not in dups:
        mainlist.append(new_list)

mainlist = [['a', 'b'],['c', 'd'],['a', 'd']] 

dups = set()

for l in mainlist:
    dups.add(frozenset(l))

print("Before:", mainlist)
add_to_mainlist(['a', 'b'])
print("After:", mainlist)

This outputs:

Before: [['a', 'b'], ['c', 'd'], ['a', 'd']]
After: [['a', 'b'], ['c', 'd'], ['a', 'd']]

Showing that the new list was indeed not added to the original.

Here's a cleaner version that calculates the existing set on the fly inside a function that does everything locally:

def add_to_mainlist(mainlist, new_list):
    dups = set()
    for l in mainlist:
        dups.add(frozenset(l))

    if frozenset(new_list) not in dups:
        mainlist.append(new_list)

    return mainlist

mainlist = [['a', 'b'],['c', 'd'],['a', 'd']] 

print("Before:", mainlist)
mainlist = add_to_mainlist(mainlist, ['a', 'b']) # the assignment isn't needed, but done anyway :-)
print("After:", mainlist)

Why doesn't your existing code work?

This is what you're doing:

...
for item in mainlist:
   if set(item) & set(swap):
     print "match was found", item
   else:
     mainlist.append(swap)

You're intersecting two sets and checking the truthiness of the result. While this might be okay for 0 intersections, in the event that even one of the elements are common (example, ['a', 'b'] and ['b', 'd']), you'd still declare a match which is false.

Ideally you'd want to check the length of the resultant set and make sure its length is equal to than 2:

dups = False 
for item in mainlist:
    if len(set(item) & set(swap)) == 2:
        dups = True
        break
   
if dups == False:
    mainlist.append(swap)

You'd also ideally want a flag to ensure that you did not find duplicates. Your previous code would add without checking all items first.

Community
  • 1
  • 1
cs95
  • 379,657
  • 97
  • 704
  • 746
  • @ChrisJ added an explanation as to why your current code is non-functional. – cs95 Jun 29 '17 at 22:19
  • _"You'd also ideally want a flag to ensure that you did not find duplicates."_ - Or you could opt to wrap his code in a function and `return` if any of the lists matches `swap`. And if not, add `swap` to `mainlist`. Of course either method would work, but it seemed like a function would be a better choice if only for reuseability-sake. – Christian Dean Jun 29 '17 at 22:27
  • @ChristianDean I felt the need to keep that example slightly simple, so as not to introduce too many foreign elements OP might not be comfortable with. Yes, I did consider putting it in a function and returning. ;) – cs95 Jun 29 '17 at 22:28
  • OK. That makes sense. Nice answer by the way :-) +1 – Christian Dean Jun 29 '17 at 22:31
  • @coldspeed Man!! so much to learn!! Thanks to you and everyone else – ChrisJ Jun 29 '17 at 22:46
  • 1
    @ChrisJ Cheers. Enjoy your newfound enlightenment. :) – cs95 Jun 29 '17 at 22:46
1

If the order of your inner lists doesn't matter, then this can trivially be accomplished using frozenset()s:

>>> mainlist = [['a', 'b'],['c', 'd'],['a', 'd']]   
>>> mainlist = [frozenset(sublist) for sublist in mainlist]
>>> 
>>> def add_to_list(lst, sublist):
...     if frozenset(sublist) not in lst:
...         lst.append(frozenset(sublist))
... 
>>> mainlist
[frozenset({'a', 'b'}), frozenset({'d', 'c'}), frozenset({'a', 'd'})]
>>> add_to_list(mainlist, ['b', 'a'])
>>> mainlist
[frozenset({'a', 'b'}), frozenset({'d', 'c'}), frozenset({'a', 'd'})]
>>> 

If the order does matter you can either do what @Coldspeed suggested - Construct a set() from your list, construct a frozenset() from the list to be added, and test for membership - or you can use all() and sorted() to test if the list to be added is equivalent to any of the other lists:

>>> def add_to_list(lst, sublist):
...     for l in lst:
...         if all(a == b for a, b in zip(sorted(sublist), sorted(l))):
...             return
...     lst.append(sublist)
... 
>>> mainlist
[['a', 'b'], ['c', 'd'], ['a', 'd']]
>>> add_to_list(mainlist, ['b', 'a'])
>>> mainlist
[['a', 'b'], ['c', 'd'], ['a', 'd']]
>>>
Christian Dean
  • 22,138
  • 7
  • 54
  • 87