Removeing duplicates of a sorted python list while iterting

Question

I Have a sorted data as follows. I want to compare them and remove anything duplicated. Here I do an simple comparison of field to test the code. Original requirement is to some complex comparison. So I need compare the previous with the successor explicitly.

The comparison is not that simple. This is just to show what I am going to achieve. There are several field that need to compare (but NOT all) and remove the previous if same values and keep the newer one which will be having a incremental number. Hence explicit comparison is required. What is the problem in pop() and append() even I don't iterate it?

I used both list and deque. But duplicates are there. Anything wrong with code?

import collections

data = [
    {'name': 'Atomic', 'age': 28},
    {'name': 'Atomic', 'age': 28},
    {'name': 'Atomic', 'age': 28},
    {'name': 'Atomic', 'age': 29},
]

dq = collections.deque()

for i in range(1, len(data)):
    prev_name = data[i-1]['name']
    prev_age = data[i-1]['age']
    next_name = data[i]['name']
    next_age = data[i]['age']

    dq.append(data[i-1])

    if prev_name == next_name and prev_age == next_age:
        dq.pop()
        dq.append(data[i])
    else:
        dq.append(data[i])

print(dq)

Output (actual): deque([{'name': 'Atomic', 'age': 28}, {'name': 'Atomic', 'age': 28}, {'name': 'Atomic', 'age': 28}, {'name': 'Atomic', 'age': 29}])

Output (expected): deque([{'name': 'Atomic', 'age': 28}, {'name': 'Atomic', 'age': 29}])

@python_user i think its pretty clear what the input and output would be here.... — Z4-tier, Jan 22 '21 at 05:33
I need the comparison explicit since the requirement is not that simple duplicate removal. — Indika Rajapaksha, Jan 22 '21 at 05:37
Does this answer your question? [Remove duplicate dict in list in Python](https://stackoverflow.com/questions/9427163/remove-duplicate-dict-in-list-in-python) — sahasrara62, Jan 22 '21 at 05:40

Robot Jung · Answer 1 · 2021-01-22T05:57:13.967

1

You can try this code :

import collections

data = [
    {'name': 'Atomic', 'age': 28},
    {'name': 'Atomic', 'age': 28},
    {'name': 'Atomic', 'age': 28},
    {'name': 'Atomic', 'age': 29},
]

dq = collections.deque()

for i in range(0, len(data)):
    if data[i] not in dq:
        dq.append(data[i])


print(dq)

output:

deque([{'name': 'Atomic', 'age': 28}, {'name': 'Atomic', 'age': 29}])

edited Jan 22 '21 at 05:57

answered Jan 22 '21 at 05:43

Robot Jung

367
4
13

@Marc oh thank you, I made a mistake. (Modification completed) – Robot Jung Jan 22 '21 at 05:58

score 1 · Accepted Answer · answered Jan 22 '21 at 06:07

The problem with your code is that you are appending the previous data element first, then if the current and previous variables same then you are removing the last element, but the thing you are not considering is that, once you add the current element after removing the previous element in:

dq.pop()
dq.append(data[i])

In the next iteration, you are again adding the previously added element in:

dq.append(data[i-1])

So, if the "if" condition is satisfied then it will just remove the last element (i.e data[i-1]) from dq and not the last element entered in the dq previously. Therefore, here it is getting duplicated with the same element.

You can try this code:

import collections

data = [
    {'name': 'Atomic', 'age': 28},
    {'name': 'Atomic', 'age': 28},
    {'name': 'Atomic', 'age': 28},
    {'name': 'Atomic', 'age': 29},
    {'name': 'Atomic', 'age': 29},
    {'name': 'Atomic', 'age': 30},
]

dq = collections.deque()
dq.append(data[0])

for i in range(1, len(data)):
    prev_name = dq[-1]['name']
    prev_age = dq[-1]['age']
    next_name = data[i]['name']
    next_age = data[i]['age']


    if prev_name == next_name and prev_age == next_age:
         continue
    else:
         dq.append(data[i])

print(dq)

Ouput:

deque([{'name': 'Atomic', 'age': 28}, {'name': 'Atomic', 'age': 29}, {'name': 'Atomic', 'age': 30}])

@Marc In the question (first line) it is mentioned that the data has been sorted already. And now the duplicates have to be removed. So, for the input mentioned by you, after sorting it would work fine with this code. — Chetan, Jan 22 '21 at 07:06

score 0 · Answer 3 · answered Jan 22 '21 at 05:43

data = [
    {'name': 'Atomic', 'age': 28},
    {'name': 'Atomic', 'age': 28},
    {'name': 'Atomic', 'age': 28},
    {'name': 'Atomic', 'age': 29},
]

unique = set((tuple(x.items()) for x in data))
print([dict(x) for x in unique])

[{'name': 'Atomic', 'age': 28}, {'name': 'Atomic', 'age': 29}]

Removeing duplicates of a sorted python list while iterting

3 Answers3