Fastest way to search common elements between 2 list of dictionaries in Python

Question

I have 2 lists of dictionaries.

list1 = [{'user_id':23, 'user_name':'John', 'age':30},
         {'user_id':24, 'user_name':'Shaun', 'age':31},
         {'user_id':25, 'user_name':'Johny', 'age':32}]

list2 =[{'user_id':23},
        {'user_id':25}]

Now I want the output

list3 = [{'user_id':23, 'user_name':'John', 'age':30},
         {'user_id':25, 'user_name':'Johny','age':32}]

I want the most efficient way because my list1 might contain millions of rows.

Did you see [this](https://stackoverflow.com/questions/382466/comparing-massive-lists-of-dictionaries-in-python) or [this](https://stackoverflow.com/questions/9845369/comparing-2-lists-consisting-of-dictionaries-with-unique-keys-in-python). Were they not fast enough? Did you try implementing this and were met with performance issues? — idjaw, Jul 10 '17 at 13:23
If you only need to perform one scan of `list1`, then you should use Jean-François Fabre's strategy. But if you need to search it multiple times then you should seriously consider transforming the list into a dict, as per omri_saadon's answer. Rather than using dicts for the inner items of this new dict it would save RAM if you used tuples or namedtuples. — PM 2Ring, Jul 10 '17 at 13:42

score 5 · Answer 1 · answered Jul 10 '17 at 13:24

you'll have to transform list2 a little bit to get a fast lookup. I'd make a set out of it

list1 = [{'user_id':23, 'user_name':'John','age':30},
         {'user_id':24, 'user_name':'Shaun','age':31},
         {'user_id':25, 'user_name':'Johny','age':32}]

list2 =[{'user_id':23},
        {'user_id':25}]

list2_ids = {d['user_id'] for d in list2}

then build list3 using a filtered list comprehension. In that case in list2_ids is very fast because it uses the lookup from set and not linear search:

list3 = [x for x in list1 if x['user_id'] in list2_ids]

print(list3)

result:

[{'user_id': 23, 'user_name': 'John', 'age': 30}, {'user_id': 25, 'user_name': 'Johny', 'age': 32}]

omri_saadon · Answer 2 · 2017-07-10T13:38:42.787

1

I would transform your list1 into a dictionary when the key is the user_id and the value is the name and age.

Now, when you look up at this dict even if the dict has a lot of elements, the complexity is O(1), for find.

In that case, the entire complexity of finding all user id's is O(len(list2))

dict1 = {23 : {'user_name':'John', 'age':30},
         24 : {'user_name':'Shaun', 'age':31},
         25 : {'user_name':'Johny', 'age':32}}

list2 =[{'user_id':23},
        {'user_id':25}]

res = [dict1.get(user['user_id']) for user in list2 if user['user_id'] in dict1]

print (res)

>>> [{'user_name': 'John', 'age': 30}, {'user_name': 'Johny', 'age': 32}]

edited Jul 10 '17 at 13:38

answered Jul 10 '17 at 13:32

omri_saadon

10,193
7
33
58

To Transform my `list1` again I need to iterate entire `list1` right. Which itself adds complexity . – curiousguy Jul 10 '17 at 13:51
@curiousguy, You need to do it once. After it you have this data structure that you can apply a lot of searches on it with O(1) complexity. – omri_saadon Jul 10 '17 at 13:52
Yes I agree with you, search on that format is very fast. The prob is my `list1` and `list2` keeps changing based on the inputs . Hence I have to do it every time. – curiousguy Jul 10 '17 at 13:55
@curiousguy , Can't you build `list1` on the fly as in the above structure? you are not the one who control it? – omri_saadon Jul 10 '17 at 13:58
Yes actually , I will try that for sure. – curiousguy Jul 10 '17 at 14:03
1

Also look on @PM 2Ring comment. This solution is good when you need to do multiple searches. If you need to search only once, the solution of Jean-François Fabre is more suitable. – omri_saadon Jul 10 '17 at 14:05

galaxyan · Answer 3 · 2017-07-10T13:49:33.917

you can use pandas to merge to dataframe together.
1. convert dict to dataframe
2. merge two dataframes on "user_id"

import pandas as pd
list1 = [{'user_id':23, 'user_name':'John', 'age':30},
          {'user_id':24, 'user_name':'Shaun', 'age':31},
          {'user_id':25, 'user_name':'Johny', 'age':32}] 
list2 =[{'user_id':23},
         {'user_id':25}] 
df1 = pd.DataFrame(list1)
df1
   age  user_id user_name
0   30       23      John
1   31       24     Shaun
2   32       25     Johny
df2 = pd.DataFrame(list2)
df2
   user_id
0       23
1       25

pd.merge(df2,df1,on='user_id')
   user_id  age user_name
0       23   30      John
1       25   32     Johny

Actually I am avoiding pandas here. – curiousguy Jul 10 '17 at 14:03 — curiousguy, Jul 10 '17 at 14:03

score 0 · Answer 4 · answered Jul 10 '17 at 13:48

Like previous posters said you need to create a list of the IDs from list 2:

list2_ids = {d['user_id'] for d in list2}

After you've done this, you can also use the filter function:

filter(lambda x: x['user_id'] in list2_ids, list1)

This, while not optimized has the benefit of having multiple implementations for parallel computations (which you might need if you're dealing with a large amount of data.

That being said the best solution performance-wise is probably set intersection (comparison):

unique_ids = set([d['user_id'] for d in list1]) & set([d['user_id'] for d in list2])
list3 = [x for x in list1 if x['user_id'] in unique_ids]

If you are sure the lists don't contain duplicates you can ignore set.

Fastest way to search common elements between 2 list of dictionaries in Python

4 Answers4