Unique elements in a list of dictionary efficiently

Question

I would like to get the unique elements from a list of dictionary based on the value of a field and retain the other fields.

Following the is the format of data I have.

[ {id:"1000", text: "abc", time_stamp: "10:30"},
  {id:"1001", text: "abc", time_stamp: "10:31"},
  {id:"1002", text: "bcd", time_stamp: "10:32"} ]

I would like an output as follows: (Unique based on the text but retains other fields)

[ {id:"1000", text: "abc", time_stamp: "10:30"}, # earlier time stamp
  {id:"1002", text: "bcd", time_stamp: "10:32"} ]

Here please notice that the uniqueness is based on the text, and I would like to retain the id and the time_stamp value as well. This question is different from Python - List of unique dictionaries question asked previously.

I tried:

Method 1: Collecting only text values from the dictionary, converting it to a list, passing it to a set, and getting the unique text values, but I lost the id and time_stamp.

Method 2: I also tried ahead, I traversed through the list of the dictionary and checked if the text value was present in the unique_list_of_text, if not append to a list_of_unique_dictionary. But this code was taking a lot of time, as I am working with a data set which has 350,000 records. Is there a better way to do it? Code for method 2:

def find_unique_elements(list_of_elements):
    no_of_elements = len(list_of_elements)
        unique_list_of_text = []
        unique_list_of_elements = []
        for iterator in range(0, no_of_elements):
            if not list_of_elements[iterator]['text'] in unique_list_of_text:
                unique_list_of_full_text.append(list_of_elements[iterator]['text'])
                unique_list_of_elements.append(list_of_elements[iterator])
        return unique_list_of_elements

han solo · Answer 1 · 2019-03-15T17:50:03.403

1

You could make a new list and just check if the item is there or not,

To make it a bit more faster, may be i'd use a better datastructure

$ cat unique.py

id = 'id'
text = 'text'
time_stamp = 'time_stamp'

data = [ {id:"1000", text: "abc", time_stamp: "10:30"},
   {id:"1001", text: "abc", time_stamp: "10:31"},
   {id:"1002", text: "bcd", time_stamp: "10:32"} ]

keys = set()
unique_items = []
for item in data:
    if item['text'] not in keys:
        unique_items.append(item)
    keys.add(item['text'])

print(unique_items)

$ python data.py 
[{'text': 'abc', 'id': '1000', 'time_stamp': '10:30'}, {'text': 'bcd', 'id': '1002', 'time_stamp': '10:32'}]

edited Mar 15 '19 at 17:50

answered Mar 15 '19 at 17:18

han solo

6,390
1
15
19

1

If you wanna go with retaining the first value then the `set` is th way to go. `in` for a list is O(N) but for a `set` it's O(1) – roganjosh Mar 15 '19 at 17:27
The first timestamp needs to be retained @roganjosh – Yash Tibrewal Mar 15 '19 at 17:29
Dict and set membership is the same complexity, but the `any()` wasn't making use of it. That was scanning a _list_ – roganjosh Mar 15 '19 at 17:49

Mykola Zotko · Accepted Answer · 2019-03-15T19:15:33.157

1

You can create a dictionary from the reversed list and get values from that dictionary:

id, text, time_stamp = 'id', 'text', 'timestamp'

l = [ {id:"1000", text: "abc", time_stamp: "10:30"},
  {id:"1001", text: "abc", time_stamp: "10:31"},
  {id:"1002", text: "bcd", time_stamp: "10:32"} ]

d = {i[text]: i for i in reversed(l)}
new_l = list(d.values())
print(new_l)
# [{'id': '1002', 'text': 'bcd', 'timestamp': '10:32'}, {'id': '1000', 'text': 'abc', 'timestamp': '10:30'}]

# if the order should be preserved
new_l.reverse()
print(new_l)
# [{'id': '1000', 'text': 'abc', 'timestamp': '10:30'}, {'id': '1002', 'text': 'bcd', 'timestamp': '10:32'}]

If the order in the final list is impotant use OrderedDict instead of dict in Python 3.6 and below.

edited Mar 15 '19 at 19:15

answered Mar 15 '19 at 19:02

Mykola Zotko

15,583
3
71
73

so the logic behind taking the reverse of the list is to overwrite the earlier time stamp right? ... This is a really good solution. Thanks!! – Yash Tibrewal Mar 16 '19 at 05:12
@YashTibrewal Exactly. – Mykola Zotko Mar 16 '19 at 07:32

Unique elements in a list of dictionary efficiently

2 Answers2