How can I count duplicates in a nested list based on first two elements in python

Question

I have a list in the form:

lst = [[1, 0, 0, 0], [1, 1, 0, 0], [2, 0, 0, 0], [2, 1, 0, 0], [2, 1, 0, 0], [1, 1, 0, 0], [3, 1, 0, 0], [1, 3, 0, 0], [2, 1, 0, 0], [2, 0, 0, 0]]

However the last two sub-elements will always be zero at the start so it could be like:

lst = [[1, 0], [1, 1], [2, 0], [2, 1], [2, 1], [1, 1], [3, 1], [1, 3], [2, 1], [2, 0]]

If that is easier.

What I want is to remove and count the duplicates of this list and set the 3rd sub-element to the count so if we take the above I want:

lst = [[1, 0, 1, 0], [1, 1, 2, 0], [2, 0, 2, 0], [2, 1, 3, 0], [3, 1, 1, 0], [1, 3, 1, 0]]

I have found explanations of how to remove duplicates at: Removing Duplicates from Nested List Based on First 2 Elements and Removing duplicates from list of lists in Python

but I don't know how to count the duplicates. The order of the elements in the overall list doesn't matter but the order of the elements in the sub-lists must be preserved as [1,3] and [3,1] aren't the same thing.

If this turns out to be a dead end I could do something like hash the first two elements for counting but only if I could get them back after counting.

Any help is appreciated. Sorry for dyslexia!

you need a Counter, some tuples (as lists are not hashable), and a list comprehension — njzk2, Jun 02 '14 at 17:42

score 1 · Answer 1 · answered Jun 02 '14 at 17:47

For example:

lst = [[1, 0, 0, 0], [1, 1, 0, 0], [2, 0, 0, 0], [2, 1, 0, 0], [2, 1, 0, 0], [1, 1, 0, 0], [3, 1, 0, 0], [1, 3, 0, 0], [2, 1, 0, 0], [2, 0, 0, 0]]

from collections import Counter

c = Counter(tuple(i) for i in lst)

print [list(item[0][0:2] + (item[1], 0)) for item in c.items()]

# [[1, 0, 1, 0], [1, 1, 2, 0], [3, 1, 1, 0], [2, 1, 3, 0], [1, 3, 1, 0], [2, 0, 2, 0]]

bbill · Accepted Answer · 2014-06-02T18:50:35.317

To elaborate on the great hint provided by njzk2:

Turn your list of lists into a list of tuples
Create a Counter from it
Get a dict from the Counter

Set the 3rd element of the sublists to the frequency from the Counter

from collections import Counter
lst = [[1, 0, 0, 0], [1, 1, 0, 0], [2, 0, 0, 0], [2, 1, 0, 0], [2, 1, 0, 0], [1, 1, 0, 0], [3, 1, 0, 0], [1, 3, 0, 0], [2, 1, 0, 0], [2, 0, 0, 0]]
list_of_tuples = [tuple(elem) for elem in lst]
dct = dict(Counter(list_of_tuples))
lst = [list(e) for e in dct]
for elem in lst:
    elem[2] = dct[tuple(elem)]

Edit: removed duplicates with the line before the for loop. Didn't see that requirement before.

score 0 · Answer 3 · edited May 23 '17 at 10:26

You can do this to keep count of the duplicates:

lst = [[1, 0], [1, 1], [2, 0], [2, 1], [2, 1], [1, 1], [3, 1], [1, 3], [2, 1], [2, 0]]

for x in lst:
    count = 1
    tmpLst = list(lst)
    tmpLst.remove(x)
    for y in tmpLst:
        if x[0] == y[0] and x[1] == y[1]:
            count = count + 1
    x.append(count)
    #x.append(0) #if you want to add that 4th element

print lst

Result:

[[1, 0, 1], [1, 1, 2], [2, 0, 2], [2, 1, 3], [2, 1, 3], [1, 1, 2], [3, 1, 1], [1, 3, 1], [2, 1, 3], [2, 0, 2]]

Then you can take lst and remove duplicates as mentioned in the link you posted.

score 0 · Answer 4 · answered Jun 02 '14 at 22:38

A different (maybe functional) approach.

lst = [[1, 0, 0, 0], [1, 1, 0, 0], [2, 0, 0, 0], [2, 1, 0, 0],\
       [2, 1, 0, 0], [1, 1, 0, 0], [3, 1, 0, 0], [1, 3, 0, 0],\
       [2, 1, 0, 0], [2, 0, 0, 0]]  

def rec_counter(lst):
    # Inner method that is called at the end. Receives a
    # list, the current element to be compared and an accumulator
    # that will contain the result.
    def counter(lst, elem, acc):
        new_lst = [x for x in lst if x != elem]
        elem[2] = lst.count(elem)
        acc.append(elem)
        if len(new_lst) == 0:
            return acc
        else:
            return counter(new_lst, new_lst[0], acc)
    # This part starts the recursion of the inner method. If the list
    # is empty, nothing to do. Otherwise, count starting with the first
    # element of the list and an empty accumulator.
    if len(lst) == 0:
        return []
    else:
        return counter(lst, lst[0], [])

print rec_counter(lst)
# [[1, 0, 1, 0], [1, 1, 2, 0], [2, 0, 2, 0], \
#  [2, 1, 3, 0], [3, 1, 1, 0], [1, 3, 1, 0]]

How can I count duplicates in a nested list based on first two elements in python

4 Answers4