How should I get a list of duplicate sublists in a list?

Question

I'm attempting to create functions that'll allow me to get a list of the unique sublists of a list. The functions are working for some lists of lists and not for others and I'm not sure why.

What would be a working, robust way to get the indices of the duplicate sublists and then to build a list of them?

The following minimal working example illustrates the functionality. The duplicates are found for list a but found incorrectly for list b.

def indices_of_list_element_duplicates(x):
    seen = set()
    for index, element in enumerate(x):
        if isinstance(element, list):
            element = tuple(element)
        if element not in seen:
            seen.add(element)
        else:
            yield index

def list_element_duplicates(x):
    indices = list(indices_of_list_element_duplicates(x))
    return [x[index] for index in indices]

a = [[1, 2], [1, 2], [2, 2], [3, 2], [4, 2], [5, 2], [5, 2]]

print(list_element_duplicates(a))

print("--------------------------------------------------------------------------------")

b = [[10], [15], [20], [10, 10], [10, 15], [10, 20], [15, 10], [15, 15], [15, 20], [20, 10], [20, 15], [20, 20], [10, 10, 10], [10, 10, 15], [10, 10, 20], [10, 15, 10], [10, 15, 15], [10, 15, 20], [10, 20, 10], [10, 20, 15], [10, 20, 20], [15, 10, 10], [15, 10, 15], [15, 10, 20], [15, 15, 10], [15, 15, 15], [15, 15, 20], [15, 20, 10], [15, 20, 15], [15, 20, 20], [20, 10, 10], [20, 10, 15], [20, 10, 20], [20, 15, 10], [20, 15, 15], [20, 15, 20], [20, 20, 10], [20, 20, 15], [20, 20, 20], [10], [15], [20], [10, 10], [10, 15], [10, 20], [15, 10], [15, 15], [15, 20], [20, 10], [20, 15], [20, 20], [10, 10, 10], [10, 10, 15], [10, 10, 20], [10, 15, 10], [10, 15, 15], [10, 15, 20], [10, 20, 10], [10, 20, 15], [10, 20, 20], [15, 10, 10], [15, 10, 15], [15, 10, 20], [15, 15, 10], [15, 15, 15], [15, 15, 20], [15, 20, 10], [15, 20, 15], [15, 20, 20], [20, 10, 10], [20, 10, 15], [20, 10, 20], [20, 15, 10], [20, 15, 15], [20, 15, 20], [20, 20, 10], [20, 20, 15], [20, 20, 20], [10], [15], [20], [10, 10], [10, 15], [10, 20], [15, 10], [15, 15], [15, 20], [20, 10], [20, 15], [20, 20], [10, 10, 10], [10, 10, 15], [10, 10, 20], [10, 15, 10], [10, 15, 15], [10, 15, 20], [10, 20, 10], [10, 20, 15], [10, 20, 20], [15, 10, 10], [15, 10, 15], [15, 10, 20], [15, 15, 10], [15, 15, 15], [15, 15, 20], [15, 20, 10], [15, 20, 15], [15, 20, 20], [20, 10, 10], [20, 10, 15], [20, 10, 20], [20, 15, 10], [20, 15, 15], [20, 15, 20], [20, 20, 10], [20, 20, 15], [20, 20, 20], [10], [15], [20], [10, 10], [10, 15], [10, 20], [15, 10], [15, 15], [15, 20], [20, 10], [20, 15], [20, 20], [10, 10, 10], [10, 10, 15], [10, 10, 20], [10, 15, 10], [10, 15, 15], [10, 15, 20], [10, 20, 10], [10, 20, 15], [10, 20, 20], [15, 10, 10], [15, 10, 15], [15, 10, 20], [15, 15, 10], [15, 15, 15], [15, 15, 20], [15, 20, 10], [15, 20, 15], [15, 20, 20], [20, 10, 10], [20, 10, 15], [20, 10, 20], [20, 15, 10], [20, 15, 15], [20, 15, 20], [20, 20, 10], [20, 20, 15], [20, 20, 20]]

print(list_element_duplicates(b))

without knowing your expected input/output I am not too sure but I feel like changing to `element = tuple(sorted(element))` may help you out. — R Nar, Dec 07 '15 at 16:34
@PadraicCunningham Why do you think it doesn't work for `a`? Aren't `[1, 2]` and `[5, 2]` the duplicates? — d3pd, Dec 07 '15 at 17:08
@d3pd, ah ok so you want to keep dupes, your logic is still incorrect — Padraic Cunningham, Dec 07 '15 at 17:19

Padraic Cunningham · Accepted Answer · 2015-12-07T17:54:55.593

You could use a Counter dict mapping the sublists to tuple and get the counts only keeping the sublists whose count is > 1:

from collections import Counter
a = [[1, 2], [1, 2], [2, 2], [3, 2], [4, 2], [5, 2], [5, 2]]


cn = Counter(map(tuple,a))

print([sub for sub in a if cn[tuple(sub)] > 1])

to work for mixed types and get unique returns:

from collections import Counter    

a = [[1, 2], [1, 2], [2, 2], [3, 2], [4, 2], [5, 2], [5, 2], "foo", 123, 123]

def counts(x):
    for ele in x:
        if isinstance(ele, Hashable):
            yield ele
        else:
            yield tuple(ele)


def unique_dupes(x):
    cnts = Counter(counts(x))
    for ele in x:
        t = ele
        if not isinstance(ele, Hashable):
            t = tuple(ele)
        if cnts[t] > 1:
            yield ele
            del cnts[t]

print(list(unique_dupes(a)))

Output:

 [[1, 2], [5, 2], 123]

@Padraic...Correct if I'm wrong but I believe OP wants list of duplicate list..no? — Iron Fist, Dec 07 '15 at 17:19
@IronFist, yep read it wrong initially, just a matter of changing == to > — Padraic Cunningham, Dec 07 '15 at 17:19

score 0 · Answer 2 · answered Dec 07 '15 at 17:34

The issue must come from these lines:

if isinstance(element, list):
    element = tuple(element)
if element not in seen:
    seen.add(element)

So what happens is if you had already in seen say for example [10,15] and then you want to check [15,10] in seen, it will return FALSE.

A fix for that and while you consider [x,y] to be the same as [y,x], is to sort each element you check, this way:

if isinstance(element, list):
    element = tuple(sorted(element))
if element not in seen:
    seen.add(element)

score 0 · Answer 3 · answered Oct 04 '17 at 14:05

Its easy with list comprehension

list_a = [[1, 2], [1, 2], [2, 2], [3, 2], [4, 2], [5, 2], [5, 2]]

    unique_list=[]
    duplicate_list=[]
    sorted_list=[sorted(item) for item in list_a]

    final_list=[unique_list.append(item) if item not in unique_list else duplicate_list.append(item) for item in sorted_list]
    print(unique_list)
    print(duplicate_list)

score -1 · Answer 4 · answered Dec 07 '15 at 16:54

-1

Python lists have a beautiful built in function for this called count. Using this you can do:

a = [[1, 2], [1, 2], [2, 2], [3, 2], [4, 2], [5, 2], [5, 2]]
dups = list()

for e in a:
    if a.count(e) > 1:
        dups.append(e)

This will give you a list called dups containing [[1,2],[1,2],[5,2],[5,2]]

answered Dec 07 '15 at 16:54

m_callens

6,100
8
32
54

my bad, post is fixed...:/ – m_callens Dec 07 '15 at 16:56

How should I get a list of duplicate sublists in a list?

4 Answers4