Python3 How to join two lists of dicts by unique keys

Question

I have two lists:

list1 = [ {'sth': 13, 'important_key1': 'AA', 'important_key2': '3'}, {'oh!': 14, 'important_key1': 'FF', 'important_key2': '4'}, {'sth_else': 'abc', 'important_key1': 'ZZ', 'important_key2': '5'}]
list2 = [ {'why-not': 'tAk', 'important_key1': 'GG', 'important_key2': '4'}, {'hmmm': 'no', 'important_key1': 'AA', 'important_key2': '3'}]

I want to return a list with objects only from list1 but if the same important_key1 and important_key2 is in any element in list2 I want this element from list2.

So the output should be:

[ {'hmmm': 'no', 'important_key1': 'AA', 'important_key2': '3'}, {'oh!': 14, 'important_key1': 'FF', 'important_key2': '4'}, {'sth_else': 'abc', 'important_key1': 'ZZ', 'important_key2': '5'}]

It is not complicated to do it by two or three loops but I wonder whether there is a simple way by using list comprehensions or something like that.

This is the "normal" way:

list1 = [ {'sth': 13, 'important_key1': 'AA', 'important_key2': '3'}, {'oh!': 14, 'important_key1': 'FF', 'important_key2': '4'}]
list2 = [ {'hmmm': 'no', 'important_key1': 'AA', 'important_key2': '3'}, {'why-not': 'tAk', 'important_key1': 'GG', 'important_key2': '4'}]

final_list = []
for element in list1:
    there_was_in_list2 = False
    for another_element in list2:
        if element['important_key1'] == another_element['important_key1'] and element['important_key2'] == another_element['important_key2']:
            final_list.append(another_element)
            there_was_in_list2 = True
            break
    if not there_was_in_list2:
        final_list.append(element)
print(final_list)

is there any Pythonic way to do that?

Ajax1234 · Answer 1 · 2018-09-21T14:02:49.587

3

You can use a list comprehension:

list1 = [{'sth': 13, 'important_key1': 'AA', 'important_key2': '3'}, {'oh!': 14, 'important_key1': 'FF', 'important_key2': '4'}]
list2 = [{'hmmm': 'no', 'important_key1': 'AA', 'important_key2': '3'}, {'why-not': 'tAk', 'important_key1': 'GG', 'important_key2': '4'}]
vals = ['important_key1', 'important_key2']
new_list = [[c if any(c[a] == i[a] for a in vals) else i for c in list2] for i in list1]
final_result = [i[0] for i in new_list if i]

Output:

[{'hmmm': 'no', 'important_key1': 'AA', 'important_key2': '3'}, {'oh!': 14, 'important_key1': 'FF', 'important_key2': '4'}]

edited Sep 21 '18 at 14:02

answered Sep 21 '18 at 13:46

Ajax1234

69,937
8
61
102

it is not my output. Check "So the output should be:" again :) – Piotr Wasilewicz Sep 21 '18 at 13:50
Dicts which are not in list2 should still be in the output list. – Piotr Wasilewicz Sep 21 '18 at 13:51
Wrapping everything in list comprehensions just because you can is not pythonic at all. Especially for more complicated tasks like this one it really makes the code more difficult to read and hence to maintain. [Readability counts.](https://www.python.org/dev/peps/pep-0020/) – a_guest Sep 21 '18 at 14:00
it doesn't work in two cases: one list is not sorted so element with the same values for important_key1 and important_key2 may be somewhere else. Second case is that these two lists don't have to be the same length. – Piotr Wasilewicz Sep 21 '18 at 14:15
I will add this to my example – Piotr Wasilewicz Sep 21 '18 at 14:15
@PiotrWasilewicz As blhsing pointed out below, the rules for producing the desired output are not clear. See [mcve](https://stackoverflow.com/help/mcve) – Ajax1234 Sep 21 '18 at 14:19
@Ajax1234 I don't understand why it is not clear. I even gave example of working code but I thought it is just a little too long. If there is any doubt this code should cleared up. – Piotr Wasilewicz Sep 23 '18 at 21:31

score 2 · Accepted Answer · edited Sep 24 '18 at 20:25

2

You can convert list2 to a dict indexed by a tuple of the values the important keys in list2, and then use it to determine if the same keys in list1 have the same values as you iterate through list1 in a list comprehension, so that the time complexity gets reduced to O(n) from your O(n*m):

keys = ['important_key1', 'important_key2']
d2 = {tuple(d[k] for k in keys): d for d in list2[::-1]}
print([d2.get(tuple(d[k] for k in keys), d) for d in list1])

This outputs (with your sample input):

[{'hmmm': 'no', 'important_key1': 'AA', 'important_key2': '3'}, {'oh!': 14, 'important_key1': 'FF', 'important_key2': '4'}, {'sth_else': 'abc', 'important_key1': 'ZZ', 'important_key2': '5'}]

As you described in your question, only {'sth': 13, 'important_key1': 'AA', 'important_key2': '3'} in list1 would get replaced by {'hmmm': 'no', 'important_key1': 'AA', 'important_key2': '3'} because only this dict has both important_key1 and important_key2 matching those of a dict in list2.

edited Sep 24 '18 at 20:25

a_guest

34,165
12
64
118

answered Sep 21 '18 at 13:53

blhsing

91,368
6
71
106

it doesn't work in two cases: one list is not sorted so element with the same values for important_key1 and important_key2 may be somewhere else. Second case is that these two lists don't have to be the same length. – Piotr Wasilewicz Sep 21 '18 at 14:14
I will add this to my example – Piotr Wasilewicz Sep 21 '18 at 14:15
Can you update your question with the inputs and expected outputs for these two cases? It isn't clear from your description what the outputs would be here. – blhsing Sep 21 '18 at 14:15
Updated my answer with your new input then. – blhsing Sep 21 '18 at 14:20
Clearly this answer's code is different than the one from the OP. Here the two lists are zipped while in the OP they are nested. – a_guest Sep 21 '18 at 14:25
I've updated my answer to reflect your new input then. – blhsing Sep 21 '18 at 14:27
It doesn't work on my production code and I don't know why yet. It seems to work for any cases which I can imagine but when I have many, many value there must be something different :( I will check it tomorrow. – Piotr Wasilewicz Sep 21 '18 at 14:50
@PiotrWasilewicz It is because the code does something else than you have in mind. While you indicated a product between the two lists (i.e. nested for) this code just zips the lists together and hence will only check one element in `list2` for each element in `list1` (instead of scanning through the whole of `list2`). – a_guest Sep 21 '18 at 15:18
I've updated my answer so that dicts in `list1` would get replaced by any dict in `dict2` that matches both important keys, as opposed to my previous answer, which would only match if the dict in `list1` and the dict in `list2` are of the same index in their respective lists. Please try again. – blhsing Sep 21 '18 at 15:38
@blhsing It's still not correct since you are taking the last match out of `list2` instead of the first (compare with OP). – a_guest Sep 22 '18 at 20:44
@a_guest You're right but it wasn't so hard to reverse the data and it works for me so it will accept this answer. – Piotr Wasilewicz Sep 24 '18 at 09:09
@PiotrWasilewicz Sure, this is probably the best answer, considering the improvement in time complexity. Since SO is a Q/A site it is important that answers really answer the actual question. @ blhsing I have applied the corresponding edit. – a_guest Sep 24 '18 at 20:32

a_guest · Answer 3 · 2018-09-21T14:05:41.200

You can spare the there_was_in_list2 variable by using for...else. The else statement will be executed when the previous for loop finished normally (i.e. it was not "broken").

final_list = []
for element in list1:
    for another_element in list2:
        if element['important_key1'] == another_element['important_key1'] and element['important_key2'] == another_element['important_key2']:
            final_list.append(another_element)
            break
    else:
        final_list.append(element)

score 1 · Answer 4 · answered Sep 21 '18 at 14:34

1

If you want your code to be more concise yet maintain readability you can replace the second for loop with a combined next and filter:

final_list.append(next(
    filter(lambda x: ..., list2),
    element  # Default in case filter yields nothing.
))

answered Sep 21 '18 at 14:34

a_guest

34,165
12
64
118

score 1 · Answer 5 · answered Sep 21 '18 at 16:05

Most all other paths are covered so here is another idea, just coming up with as many possible routes we can, this was fun btw thank you :)

l3 = l1[:]

for idx, item in enumerate(l2):    
    for x, i in enumerate(l1):
        k = list(zip(item.values(), i.values())) 
        if len(set(k[1])) < len(k[1]) and len(set(k[2])) < len(k[2]):
            l3[x] = item

print(l3)

(xenial)vash@localhost:~/python/stack_overflow/sept$ python3.7 uniq.py
[{'hmmm': 'no', 'important_key1': 'AA', 'important_key2': '3'}, {'oh!':14, 'important_key1': 'FF', 'important_key2': '4'},
{'sth_else': 'abc', 'important_key1': 'ZZ', 'important_key2': '5'}]

Python3 How to join two lists of dicts by unique keys

5 Answers5