Count Amount of duplicate Sublists in list with Python

Question

I've been researching for quite some time now but can't seem to find how to do this properly. I have a List that consists of a sum of 113287 sub-lists, that each hold 2 integers with 2-3 digits each.

list = [[123, 456], [111, 111], [222, 222], [333, 333], [123, 456], [222, 222], [123, 456]]

Now I want to count the amount of sub-lists, that exist more than once. Not the amount of duplicates overall, also index is irrelevant, I just want to know which combination of values exists more than once.

The result for the example should be "2", since only the sub-lists "[222, 222]" and "[123, 456]" exist more than once.

If possible and only if it doesn't overcomplicate things, I would like to do it without external libraries.

I just can't seem to figure it out, any help is appreciated.

where is your attempt, for loop or something? list is a bad variable name it is colored as you see — , Dec 06 '21 at 19:46
do you want to count [123,456] and [456,123] as two or one each? — , Dec 06 '21 at 20:21

score 2 · Accepted Answer · answered Dec 06 '21 at 19:51

2

Use collections.Counter to count the elements, then loop over the result to keep only those that have a count greater than 1, and sum:

my_list = [[123, 456], [111, 111], [222, 222], [333, 333],
           [123, 456], [222, 222], [123, 456]]

from collections import Counter

c = Counter(map(tuple, my_list))
number = sum(v>1 for v in c.values())

output: 2

NB. you need to convert the sublists to tuples for them to be hashable and counted by Counter

answered Dec 06 '21 at 19:51

mozway

194,879
13
39
75

1

Oh my god that's it. I tried it so many times with Counter, but had things like Counter counting the Values independedly, disrgarding that I grouped them by touple like "sum(y for y in Counter(tuple(x) for x in list.values() if y > 1)" and many other variations .. guess I gotta learn Counter a bit more to figure out where my thoughprocess failed. THANK YOU very much though! – niristius Dec 06 '21 at 19:54
@mhopfner, seems like you accept this as the answer, please make sure to click the "checkmark" (near the rating) to accept the answer. Cheers! – Zach Young Dec 06 '21 at 20:07

score 1 · Answer 2 · 2021-12-06T20:19:54.440

You can iterate over the set of your list. But because lists are unhashable, you'll need to convert each list in lst to a tuple. Then simply count the number of times each list in lst appears in lst:

lst = [[123, 456], [111, 111], [222, 222], [333, 333], [123, 456], [222, 222], [123, 456]]
out = sum(1 for l in set(map(tuple,lst)) if lst.count(list(l))>1)

Output:

Also if you want to count [[12,34],[34,12]] as 2, then building off of @mozway's answer, you can do:

for i, l in enumerate(my_list):
    if l[::-1] in my_list[:i]:
        my_list[i] = l[::-1]

c = Counter(map(tuple, my_list))
number = sum(v>1 for v in c.values())

using `count` repeatedly is giving bad performance, you'll need to read again the whole list for each existing element. Use `collections.Counter` instead (see [my answer](https://stackoverflow.com/a/70251143/16343464)) — mozway, Dec 06 '21 at 19:56

score 0 · Answer 3 · answered Dec 06 '21 at 20:13

You can make a non-duplicated version of your list, then count the number of duplicated elements:

ls = [[123, 456], [111, 111], [222, 222], [333, 333], [123, 456], [222, 222], [123, 456]]
uni = []
c = 0
for l in ls:
    if not l in uni:
        uni.append(l)

for l in uni:
    if ls.count(l) != 1:
        c+=1

print(c)

Output: 2

score 0 · Answer 4 · answered Dec 06 '21 at 20:35

MY_LIST = [[123, 456], [111, 111], [222, 222], [333, 333], [123, 456], [222, 222], [123, 456]]

try this:

from: Removing duplicates from a list of lists

uniques = []
for elem in MY_LIST:
    if elem not in uniques:
        uniques.append(elem)
print(uniques,'\n')

and then:

repeated = {}
for elem in uniques:
    counter = 0
    for _elem in MY_LIST:
        if elem == _elem:
            counter=counter+1
    if counter > 1:
        repeated[str(elem)] = counter

print('amount of repeated sublists: {}'.format(len(repeated)))
print(repeated)

output:

[[123, 456], [111, 111], [222, 222], [333, 333]] 

amount of repeated sublists: 2
{'[123, 456]': 3, '[222, 222]': 2}

Count Amount of duplicate Sublists in list with Python

4 Answers4