0

I Have a sorted data as follows. I want to compare them and remove anything duplicated. Here I do an simple comparison of field to test the code. Original requirement is to some complex comparison. So I need compare the previous with the successor explicitly.

The comparison is not that simple. This is just to show what I am going to achieve. There are several field that need to compare (but NOT all) and remove the previous if same values and keep the newer one which will be having a incremental number. Hence explicit comparison is required. What is the problem in pop() and append() even I don't iterate it?

I used both list and deque. But duplicates are there. Anything wrong with code?

import collections

data = [
    {'name': 'Atomic', 'age': 28},
    {'name': 'Atomic', 'age': 28},
    {'name': 'Atomic', 'age': 28},
    {'name': 'Atomic', 'age': 29},
]

dq = collections.deque()

for i in range(1, len(data)):
    prev_name = data[i-1]['name']
    prev_age = data[i-1]['age']
    next_name = data[i]['name']
    next_age = data[i]['age']

    dq.append(data[i-1])

    if prev_name == next_name and prev_age == next_age:
        dq.pop()
        dq.append(data[i])
    else:
        dq.append(data[i])

print(dq)

Output (actual): deque([{'name': 'Atomic', 'age': 28}, {'name': 'Atomic', 'age': 28}, {'name': 'Atomic', 'age': 28}, {'name': 'Atomic', 'age': 29}])

Output (expected): deque([{'name': 'Atomic', 'age': 28}, {'name': 'Atomic', 'age': 29}])

Indika Rajapaksha
  • 1,056
  • 1
  • 14
  • 36

3 Answers3

1

You can try this code :

import collections

data = [
    {'name': 'Atomic', 'age': 28},
    {'name': 'Atomic', 'age': 28},
    {'name': 'Atomic', 'age': 28},
    {'name': 'Atomic', 'age': 29},
]

dq = collections.deque()

for i in range(0, len(data)):
    if data[i] not in dq:
        dq.append(data[i])


print(dq)

output:

deque([{'name': 'Atomic', 'age': 28}, {'name': 'Atomic', 'age': 29}])
Robot Jung
  • 367
  • 4
  • 13
1

The problem with your code is that you are appending the previous data element first, then if the current and previous variables same then you are removing the last element, but the thing you are not considering is that, once you add the current element after removing the previous element in:

dq.pop()
dq.append(data[i])

In the next iteration, you are again adding the previously added element in:

dq.append(data[i-1])

So, if the "if" condition is satisfied then it will just remove the last element (i.e data[i-1]) from dq and not the last element entered in the dq previously. Therefore, here it is getting duplicated with the same element.

You can try this code:

import collections

data = [
    {'name': 'Atomic', 'age': 28},
    {'name': 'Atomic', 'age': 28},
    {'name': 'Atomic', 'age': 28},
    {'name': 'Atomic', 'age': 29},
    {'name': 'Atomic', 'age': 29},
    {'name': 'Atomic', 'age': 30},
]

dq = collections.deque()
dq.append(data[0])

for i in range(1, len(data)):
    prev_name = dq[-1]['name']
    prev_age = dq[-1]['age']
    next_name = data[i]['name']
    next_age = data[i]['age']


    if prev_name == next_name and prev_age == next_age:
         continue
    else:
         dq.append(data[i])

print(dq)

Ouput:

deque([{'name': 'Atomic', 'age': 28}, {'name': 'Atomic', 'age': 29}, {'name': 'Atomic', 'age': 30}])
Chetan
  • 101
  • 1
  • 5
  • @Marc In the question (first line) it is mentioned that the data has been sorted already. And now the duplicates have to be removed. So, for the input mentioned by you, after sorting it would work fine with this code. – Chetan Jan 22 '21 at 07:06
0
data = [
    {'name': 'Atomic', 'age': 28},
    {'name': 'Atomic', 'age': 28},
    {'name': 'Atomic', 'age': 28},
    {'name': 'Atomic', 'age': 29},
]

unique = set((tuple(x.items()) for x in data))
print([dict(x) for x in unique])

[{'name': 'Atomic', 'age': 28}, {'name': 'Atomic', 'age': 29}]
Lior Cohen
  • 5,570
  • 2
  • 14
  • 30