4

I've seen some questions here very related but their answer doesn't work for me. I have a list of lists where some sublists are repeated but their elements may be disordered. For example

g = [[1, 2, 3], [3, 2, 1], [1, 3, 2], [9, 0, 1], [4, 3, 2]]

The output should be, naturally according to my question:

g = [[1,2,3],[9,0,1],[4,3,2]]

I've tried with set but only removes those lists that are equal (I thought It should work because sets are by definition without order). Other questions i had visited only has examples with lists exactly duplicated or repeated like this: Python : How to remove duplicate lists in a list of list?. For now order of output (for list and sublists) is not a problem.

Community
  • 1
  • 1
Alejandro Sazo
  • 796
  • 15
  • 31

4 Answers4

7

(ab)using side-effects version of a list comp:

seen = set()

[x for x in g if frozenset(x) not in seen and not seen.add(frozenset(x))]
Out[4]: [[1, 2, 3], [9, 0, 1], [4, 3, 2]]

For those (unlike myself) who don't like using side-effects in this manner:

res = []
seen = set()

for x in g:
    x_set = frozenset(x)
    if x_set not in seen:
        res.append(x)
        seen.add(x_set)

The reason that you add frozensets to the set is that you can only add hashable objects to a set, and vanilla sets are not hashable.

roippi
  • 25,533
  • 4
  • 48
  • 73
  • Your answer was the fastest. And curiously it gave the output that wanted but I haven't asked for. It is twice faster than @jterrace answer. For a list of 4205 sublists, yours did it in 0.02 seconds. – Alejandro Sazo May 25 '14 at 00:12
  • @AlejandroSazo check please performance of my answer with generator expresion inside: `g = [list(x) for x in set(frozenset(i) for i in (set(i) for i in g))]` I am just curious about the benchmark, and will be happy to see it:) – andilabs May 25 '14 at 00:16
  • 1
    @andi I will do it as soon as possible – Alejandro Sazo May 25 '14 at 00:35
3

If you don't care about the order for lists and sublists (and all items in sublists are unique):

result = set(map(frozenset, g))

If a sublist may have duplicates e.g., [1, 2, 1, 3] then you could use tuple(sorted(sublist)) instead of frozenset(sublist) that removes duplicates from a sublist.

If you want to preserve the order of sublists:

def del_dups(seq, key=frozenset):
    seen = {}
    pos = 0
    for item in seq:
        if key(item) not in seen:
            seen[key(item)] = True
            seq[pos] = item
            pos += 1
    del seq[pos:]

Example:

del_dups(g, key=lambda x: tuple(sorted(x)))

See In Python, what is the fastest algorithm for removing duplicates from a list so that all elements are unique while preserving order?

Community
  • 1
  • 1
jfs
  • 399,953
  • 195
  • 994
  • 1,670
  • 1
    +1 Nice idea! Woul only add `[list(x) for x in set(map(frozenset, g))]` to produce output as the OP wanted. – andilabs May 24 '14 at 23:54
  • 2
    @andi: I left the `result` as a set of `frozenset`s to stress that all sublists and all items in each sublist are unique and the order doesn't matter. – jfs May 24 '14 at 23:56
1

I would convert each element in the list to a frozenset (which is hashable), then create a set out of it to remove duplicates:

>>> g = [[1, 2, 3], [3, 2, 1], [1, 3, 2], [9, 0, 1], [4, 3, 2]]
>>> set(map(frozenset, g))
set([frozenset([0, 9, 1]), frozenset([1, 2, 3]), frozenset([2, 3, 4])])

If you need to convert the elements back to lists:

>>> map(list, set(map(frozenset, g)))
[[0, 9, 1], [1, 2, 3], [2, 3, 4]]
jterrace
  • 64,866
  • 22
  • 157
  • 202
1

What about using mentioned by roippi frozenset this way:

>>> g = [list(x) for x in set(frozenset(i) for i in [set(i) for i in g])]

[[0, 9, 1], [1, 2, 3], [2, 3, 4]]
andilabs
  • 22,159
  • 14
  • 114
  • 151
  • If performance matters you can replace `list comprehension` with `generator expresion`. Just replace `[]` with `()` making it: `g = [list(x) for x in set(frozenset(i) for i in (set(i) for i in g))]` What is the difference you can read here: http://stackoverflow.com/questions/47789/generator-expressions-vs-list-comprehension – andilabs May 25 '14 at 00:14
  • 1
    Your first approach (list comprehension) uses 0.070491 seconds. Your approach with generator expression took 0.030556 seconds. The accepted answer time was 0.02 seconds :) – Alejandro Sazo May 25 '14 at 01:52