1

I wonder if there is a way to merge two list of dictionary base on a single key:value in each dictionary. As I tried to search and some cases, they seem did not match what I needed.

Suppose that I have two lists of dictionary as below:

[{'standoff_id': 1, 'entity_type': 'Concept', 'offset_start': 13, 'offset_end': 18, 'word': 'wheat'}, {'standoff_id': 2, 'entity_type': 'Concept', 'offset_start': 26, 'offset_end': 30, 'word': 'corn'}, {'standoff_id': 3, 'entity_type': 'Concept', 'offset_start': 61, 'offset_end': 67, 'word': 'barley'}]
[{'standoff_id': 1, 'concept_id': '8373'}, {'standoff_id': 2, 'concept_id': '12332'}, {'standoff_id': 3, 'concept_id': '823'}]

Those two lists always have the same length and I want to merge these two lists i.e each dictionary in parallel based on the 'standoff_id': id.

My desired output is:

[{'standoff_id': 1, 'entity_type': 'Concept', 'offset_start': 13, 'offset_end': 18, 'word': 'wheat', 'concept_id': '8373'}, {'standoff_id': 2, 'entity_type': 'Concept', 'offset_start': 26, 'offset_end': 30, 'word': 'corn', 'concept_id': '12332'}, {'standoff_id': 3, 'entity_type': 'Concept', 'offset_start': 61, 'offset_end': 67, 'word': 'barley', 'concept_id': '823'}]

Any help would be much appreciated! Thank you!

Erwin
  • 325
  • 1
  • 9
  • 1
    Does this answer your question? [join two lists of dictionaries on a single key](https://stackoverflow.com/questions/5501810/join-two-lists-of-dictionaries-on-a-single-key) – Daria Pydorenko Aug 10 '21 at 08:46

2 Answers2

2

Given that you say that your lists "always have the same length", and appear already to be sorted, I believe you can do this in one line. If you're using python >= 3.9, you can use the new | operator for dictionaries:

a = [{'standoff_id': 1, 'entity_type': 'Concept', 'offset_start': 13, 'offset_end': 18, 'word': 'wheat'}, {'standoff_id': 2, 'entity_type': 'Concept', 'offset_start': 26, 'offset_end': 30, 'word': 'corn'}, {'standoff_id': 3, 'entity_type': 'Concept', 'offset_start': 61, 'offset_end': 67, 'word': 'barley'}]
b = [{'standoff_id': 1, 'concept_id': '8373'}, {'standoff_id': 2, 'concept_id': '12332'}, {'standoff_id': 3, 'concept_id': '823'}]

merged_dicts = [(d1 | d2) for d1, d2 in zip(a, b)]

If you're using a version of python >=3.5 and <=3.8, you can achieve the same result using dictionary unpacking;

merged_dicts = [{**d1, **d2} for d1, d2 in zip(a, b)]

If your dictionary lists are not always sorted, and/or they are not always the same length, you could do something like this:

from operator import itemgetter 
from itertools import groupby

standoff_id = itemgetter('standoff_id')
merged_dicts = []

for k, v in groupby(sorted((a + b), key=standoff_id), key=standoff_id):
    merged_dicts.append({key:val for d in v for key, val in d.items()})

Or, using collections.ChainMap instead of a dictionary comprehension (achieving the same outcome):

from operator import itemgetter 
from itertools import groupby
from collections import ChainMap

standoff_id = itemgetter('standoff_id')
merged_dicts = []

for k, v in groupby(sorted((a + b), key=standoff_id), key=standoff_id):
    merged_dicts.append(dict(ChainMap(*v)))
Alex Waygood
  • 6,304
  • 3
  • 24
  • 46
1

You can try this.

  • Sorting the lists based on the standoff_id to make the comparison easier.
  • d will have the merged data.
a = [{'standoff_id': 1, 'entity_type': 'Concept', 'offset_start': 13, 'offset_end': 18, 'word': 'wheat'}, {'standoff_id': 2, 'entity_type': 'Concept', 'offset_start': 26, 'offset_end': 30, 'word': 'corn'}, {'standoff_id': 3, 'entity_type': 'Concept', 'offset_start': 61, 'offset_end': 67, 'word': 'barley'}]
b = [{'standoff_id': 1, 'concept_id': '8373'}, {'standoff_id': 2, 'concept_id': '12332'}, {'standoff_id': 3, 'concept_id': '823'}]

a.sort(key= lambda x: x['standoff_id'])
b.sort(key= lambda x: x['standoff_id'])

d = []

for i in range(len(a)):
    if a[i]['standoff_id'] == b[i]['standoff_id']:
        d.append(dict(a[i]))
        d[-1].update(b[i])

print(d)
        
[{'standoff_id': 1, 'entity_type': 'Concept', 'offset_start': 13, 'offset_end': 18, 'word': 'wheat', 'concept_id': '8373'}, {'standoff_id': 2, 'entity_type': 'Concept', 'offset_start': 26, 'offset_end': 30, 'word': 'corn', 'concept_id': '12332'}, {'standoff_id': 3, 'entity_type': 'Concept', 'offset_start': 61, 'offset_end': 67, 'word': 'barley', 'concept_id': '823'}]
Ram
  • 4,724
  • 2
  • 14
  • 22