3

I have a list of dictionaries like this :

time_array_final = [{'day': 15, 'month': 5},{'day': 29, 'month': 5}, {'day': 10, 'month': 6}, {'day': 10, 'month': 6}, {'day': 10, 'month': 6}, {'day': 10, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 14, 'month': 6},{'day': 15, 'month': 6}, {'day': 15, 'month': 6}, {'day': 15, 'month': 6}]

I want to remove the duplicate dictionaries from this list. Here is what I tried:

import ast
final  = [ast.literal_eval(el1) for el1 in set([str(el2) for el2 in time_array_final])]

eventually it's working but there is issue I want to retain this data in its original order but the order is modified in my output. Is there a way to remove duplicates and maintain the order from the original list?

Note: expected output should be unique and in case of repeating it should pick one record from repeating elements as the code doing above for example in this case output should be

[{'day': 15, 'month': 5},{'day': 29, 'month': 5},{'day': 10, 'month': 6}, {'day': 12, 'month': 6}, {'day': 14, 'month': 6},{'day': 15, 'month': 6}]
Pranav Hosangadi
  • 23,755
  • 7
  • 44
  • 70
back-new
  • 121
  • 11
  • 1
    It would really help if you provided info what you got and what you expect, along with a minimal set of input data that illustrates this, a.k.a. [mcve]. – Ulrich Eckhardt Jun 16 '22 at 16:03
  • Maybe if you had formatted your data with one dict per line so that it's readable, more people would've noticed that it's sorted, could've asked about that, and could've written better solutions taking advantage of that... – Kelly Bundy Jun 16 '22 at 19:41

3 Answers3

2

Use a set to keep track of unique items. The items are converted to strings because dictionaries cannot be hashed in a set (otherwise, you will get an error "TypeError: unhashable type: 'dict'"). Iterate over the original list, adding the element only if its string representation was not already seen.

time_array_final = [{'day': 15, 'month': 5},{'day': 29, 'month': 5}, {'day': 10, 'month': 6}, {'day': 10, 'month': 6}, {'day': 10, 'month': 6}, {'day': 10, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 14, 'month': 6},{'day': 15, 'month': 6}, {'day': 15, 'month': 6}, {'day': 15, 'month': 6}]

time_array_final_unique = []
time_array_final_set = set()

for d in time_array_final:
    if str(d) not in time_array_final_set:
        time_array_final_unique.append(d)
        time_array_final_set.add(str(d))
print(time_array_final_unique)
# [{'day': 15, 'month': 5}, {'day': 29, 'month': 5}, {'day': 10, 'month': 6}, {'day': 12, 'month': 6}, {'day': 14, 'month': 6}, {'day': 15, 'month': 6}]
Timur Shtatland
  • 12,024
  • 2
  • 30
  • 47
  • hi sir thanks for answer but seems there is a problem records are missing in output for example {'day': 12, 'month': 6} , {'day': 15, 'month': 6} etc . as these are repeating then I want these as single record . – back-new Jun 16 '22 at 16:04
  • OP seems to want to de-duplicate their list, not get unique items – Pranav Hosangadi Jun 16 '22 at 16:07
  • I have updated question kindly check with sample output @PranavHosangadi – back-new Jun 16 '22 at 16:08
  • why string conversion? – Copperfield Jun 16 '22 at 17:57
  • @Copperfield: I added the explanation to the answer. – Timur Shtatland Jun 16 '22 at 18:07
  • sure, but if for whatever reason the dict happens to by build in a different order it will fail to remove that duplicate, like for example: `(str({'day': 15, 'month': 5}) == str({'month': 5, 'day': 15}))==False` – Copperfield Jun 16 '22 at 18:13
  • a more reliable way to transform a dict into something that can be use in a set is to use a frozenset (which is an immutable set, and as such can be inside a set) over the items of the dictionary (so long the values are also hashable), `frozenset({'day': 15, 'month': 5}.items()) == frozenset({'month': 5, 'day': 15}.items())` – Copperfield Jun 16 '22 at 18:23
2

You can create a dictionary where the key is the string representation of the items in your list, and the value is the actual item.

time_array_final = [{'day': 15, 'month': 5},{'day': 29, 'month': 5}, {'day': 10, 'month': 6}, {'day': 10, 'month': 6}, {'day': 10, 'month': 6}, {'day': 10, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 14, 'month': 6},{'day': 15, 'month': 6}, {'day': 15, 'month': 6}, {'day': 15, 'month': 6}]

dedupe_dict = {str(item): item for item in time_array_final}

Upon encountering a duplicate item, the dict comprehension will overwrite the previous item with the duplicate one, but that doesn't make any material difference because both items are identical.

Since python 3.6, dictionaries keep insertion order, so dict.values() should give you the output you need.

deduped_list = list(dedupe_dict.values())

Which gives:

[{'day': 15, 'month': 5},
 {'day': 29, 'month': 5},
 {'day': 10, 'month': 6},
 {'day': 12, 'month': 6},
 {'day': 14, 'month': 6},
 {'day': 15, 'month': 6}]

As noted by @Copperfield in their comments on another answer, str(dict) is not the most reliable way of stringifying dicts for comparison, because the order of keys matters.

d1 = {'day': 1, 'month': 2}
d2 = {'month': 2, 'day': 1}

d1 == d2 # True
str(d1) == str(d2) # False

To get around this, you could create a frozenset of the dict.items(), and use that as your key (provided all the values in your dict are hashable) like so:

dedupe_dict = {frozenset(d.items()): d for d in time_array_final}
Pranav Hosangadi
  • 23,755
  • 7
  • 44
  • 70
  • 1
    why string conversion? in this particular example it might work, but it will fail if the dict happens to be build in a different order same as I mention to @TimurShtatland – Copperfield Jun 16 '22 at 18:29
  • @Copperfield good point. I modified my answer to include your suggestion – Pranav Hosangadi Jun 16 '22 at 18:37
  • For the `frozenset` method, you'd still need to unpack it back to the original formatting with something like `list(map(dict, dedupe_dict))` no? – BeRT2me Jun 16 '22 at 21:19
  • 2
    @BeRT2me no, because the frozensets are the _keys_ in that dictionary. The values are still the original dictionaries, and that's what we use in `deduped_list = list(dedupe_dict.values())` – Pranav Hosangadi Jun 16 '22 at 21:36
0

as an addendum to @BeRT2me answer, you can go a step further and use the ListBaseSet recipe you can find in the standard library

import collections

class ListBasedSet(collections.abc.Set):
    ''' Alternate set implementation favoring space over speed
        and not requiring the set elements to be hashable. '''
    def __init__(self, iterable):
        self.elements = lst = []
        for value in iterable:
            if value not in lst:
                lst.append(value)

    def __iter__(self):
        return iter(self.elements)

    def __contains__(self, value):
        return value in self.elements

    def __len__(self):
        return len(self.elements)

put it in your toolkit and use is simple like

>>> time_array_final = [{'day': 15, 'month': 5},{'day': 29, 'month': 5}, {'day': 10, 'month': 6}, {'day': 10, 'month': 6}, {'day': 10, 'month': 6}, {'day': 10, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 14, 'month': 6},{'day': 15, 'month': 6}, {'day': 15, 'month': 6}, {'day': 15, 'month': 6}]
>>> 
>>> expected=[{'day': 15, 'month': 5},{'day': 29, 'month': 5},{'day': 10, 'month': 6}, {'day': 12, 'month': 6}, {'day': 14, 'month': 6},{'day': 15, 'month': 6}]
>>> 
>>> expected == list(ListBasedSet(time_array_final))
True
>>> 
Copperfield
  • 8,131
  • 3
  • 23
  • 29