Removing duplicate equivalence classes from a list

Question

I am looking for a way to take a list of elements, some of which may be "equivalent", and return a list with no equivalent elements. My rough attempt is this:

unique = []
for item1 in mylist:
    include = 1
    for item2 in unique:
      if are_these_equivalent(item1, item2):
        include = 0
        break        #no need to examine anymore items
    if include == 1:
      unique.append(item1)

I would guess that algorithmically there's not much to be done, but it feels like the code is a little messy. Are there any nice ways to spruce it up?

EDIT: The equivalence I am using is whether two permutations are conjugate in S_n, but any abstraction of equivalent (in the sense of equivalence classes) should work the same.

Please **clarify your specific problem or add additional details to highlight exactly what you need**. As it's currently written, it’s hard to tell exactly what you're asking. See the [How to Ask](http://stackoverflow.com/help/how-to-ask) page for help clarifying this question — Pedro Lobito, May 01 '17 at 03:31
You can substantially improve performance of your code by making `unique` a `set()` (change `unique.append` to `unique.add`), because sets have much better membership lookup time. — DYZ, May 01 '17 at 03:39
Even though there isn't much to be done algorithmically... I think there is a bug in your code. The last two lines should be one indentation to the left. Now you are assuming an item is unique if the first other one is not similar. And addng the same item multiple times. — Antti_M, May 01 '17 at 03:41
And also comparing the item to itself, making it un-unique even when you fix the indentation. — Antti_M, May 01 '17 at 03:43
Someone already asked this question before: http://stackoverflow.com/questions/38924421/is-there-a-standard-way-to-partition-an-interable-into-equivalence-classes-given — DYZ, May 01 '17 at 04:06
@DYZ About making unique a set, there's no lookup happening. Will sets still be faster for iterating over and for appending? — Jacob Bond, May 01 '17 at 04:20
Actually, you are right. There are no lookups in you code, and appending to a list is indeed faster than insertion (iteration is faster, too). — DYZ, May 01 '17 at 04:24
You should consider using a [union-find data structure](https://en.wikipedia.org/wiki/Disjoint-set_data_structure). Simply union all of the equivalent items and then take the value at the root node of each tree in the forest. — pzp, May 01 '17 at 04:27

DYZ · Accepted Answer · 2017-05-01T04:25:18.670

2

In the light of what was said in the comments, here is an improved and corrected version of the code. It is only tangentially better than your original code.

unique = []
for item1 in mylist:
  for item2 in unique:
    if are_these_equivalent(item1, item2):
      break
  else:
    unique.append(item1)

edited May 01 '17 at 04:25

answered May 01 '17 at 03:43

DYZ

55,249
10
64
93

score 0 · Answer 2 · answered May 01 '17 at 03:27

0

You could use a set to remove any duplicates easily.

unique = list(set(unique))

set(unique) would create a set, which, by definition, cannot contain any repeats. Calling list() on that would turn the returned set back into a list. This is not necessary, and depends on what you plan to do with it after.

answered May 01 '17 at 03:27

Deem

7,007
2
19
23

`set()` uses a very specific definition of equivalence to remove duplicates that in general may not work for the OP. – DYZ May 01 '17 at 03:35
@DYZ that's true! We'll see what OP says, and I'll modify accordingly. – Deem May 01 '17 at 03:38

score -1 · Answer 3 · answered May 01 '17 at 04:51

It looks to me like you may be looking for a "Set"?

Something like this perhaps?

my_list = [1,2,3,4,5,2,3,5,6,
           'apples', 'apples', 'oranges', 'bananas', 'oranges']

unique = [i for i in set(my_list)]
print unique

Pretty much, when you use set(), you can pass it an iterable (a list) and it will only contain values once. If they come up again in the list, they are ignored.

Removing duplicate equivalence classes from a list

3 Answers3