0

Imagine I have the following dictionary.For every record (row of data), I want to merge the dictionaries of sub fields into a single dictionary. So in the end I have a list of dictionaries. One per each record.

Data = [{'Name': 'bob', 'age': '40’}
        {'Name': 'tom', 'age': '30’},
        {'Country’: 'US', 'City': ‘Boston’},
        {'Country’: 'US', 'City': ‘New York},
        {'Email’: 'bob@fake.com', 'Phone': ‘bob phone'},
        {'Email’: 'tom@fake.com', 'Phone': ‘none'}]
               
Output = [
{'Name': 'bob', 'age': '40’,'Country’: 'US', 'City': ‘Boston’,'Email’: 'bob@fake.com', 'Phone': ‘bob phone'},
{'Name': 'tom', 'age': '30’,'Country’: 'US', 'City': ‘New York', 'Email’: 'tom@fake.com', 'Phone': ‘none'}
]

1 Answers1

1

Related: How do I merge a list of dicts into a single dict?

I understand you know which dictionary relates to Bob and which dictionary relates to Tom by their position: dictionaries at even positions relate to Bob, while dictionaries at odd positions relate to Tom.

You can check whether a number is odd or even using % 2:

Data = [{'Name': 'bob', 'age': '40'},
        {'Name': 'tom', 'age': '30'},
        {'Country': 'US', 'City': 'Boston'},
        {'Country': 'US', 'City': 'New York'},
        {'Email': 'bob@fake.com', 'Phone': 'bob phone'},
        {'Email': 'tom@fake.com', 'Phone': 'none'}]
bob_dict = {}
tom_dict = {}
for i,d in enumerate(Data):
  if i % 2 == 0:
    bob_dict.update(d)
  else:
    tom_dict.update(d)
Output=[bob_dict, tom_dict]

Or alternatively:

Output = [{}, {}]
for i, d in enumerate(Data):
  Output[i%2].update(d)

This second approach is not only shorter to write, it's also faster to execute and easier to scale if you have more than 2 people.

Splitting the list into more than 2 dictionaries

k = 4 # number of dictionaries you want
Data = [{'Name': 'Alice', 'age': '40'},
        {'Name': 'Bob', 'age': '30'},
        {'Name': 'Charlie', 'age': '30'},
        {'Name': 'Diane', 'age': '30'},
        {'Country': 'US', 'City': 'Boston'},
        {'Country': 'US', 'City': 'New York'},
        {'Country': 'UK', 'City': 'London'},
        {'Country': 'UK', 'City': 'Oxford'},
        {'Email': 'alice@fake.com', 'Phone': 'alice phone'},
        {'Email': 'bob@fake.com', 'Phone': '12345'},
        {'Email': 'charlie@fake.com', 'Phone': '0000000'},
        {'Email': 'diane@fake.com', 'Phone': 'none'}]
Output = [{} for j in range(k)]
for i, d in enumerate(Data):
  Output[i%k].update(d)

# Output = [
#  {'Name': 'Alice', 'age': '40', 'Country': 'US', 'City': 'Boston', 'Email': 'alice@fake.com', 'Phone': 'alice phone'},
#  {'Name': 'Bob', 'age': '30', 'Country': 'US', 'City': 'New York', 'Email': 'bob@fake.com', 'Phone': '12345'},
#  {'Name': 'Charlie', 'age': '30', 'Country': 'UK', 'City': 'London', 'Email': 'charlie@fake.com', 'Phone': '0000000'},
#  {'Name': 'Diane', 'age': '30', 'Country': 'UK', 'City': 'Oxford', 'Email': 'diane@fake.com', 'Phone': 'none'}
#]

Additionally, instead of hardcoding k = 4:

  • If you know the number of fields but not the number of people, you can compute k by dividing the initial number of dictionaries by the number of dictionary types:
fields = ['Name', 'Country', 'Email']
assert(len(Data) % len(fields) == 0)    # make sure Data is consistent with number of fields
k = len(Data) // len(fields)
  • Or alternatively, you can compute k by counting how many occurrences of the 'Names' field you have:
k = sum(1 for d in Data if 'Name' in d)
Stef
  • 13,242
  • 2
  • 17
  • 28
  • Thanks a lot for your answer, unfortunately the dataset contains many more items than those 2, this would work in a case of two. ``` @staticmethod def _concat_dicts(elements): """ :param elements: A list of dictionaries ex: [{'a':'1', 'b':'2'}, {a:'k', b:'j'}] :return: Combined dictionary key and list of values {'a': [ '1','k'], 'b': ['2', 'j']} """ data = defaultdict(list) for i in elements: for key, value in i.items(): data[key].append(value) return dict(data) – Mr Sarkisian Oct 03 '20 at 09:20
  • This for example concatenates values into a single key : list[] pairs, but i need the data in a primitive form, this does the job if i need it to be in a dataframe. – Mr Sarkisian Oct 03 '20 at 09:22
  • 1
    Thanks a lot, Stef. I think the above works great if the dataset is consistent. Thank you so much! Because my dataset was inconsistent i introduced an id column and merged records based on that id column that i added to all dictionary items, that did the trick. – Mr Sarkisian Oct 10 '20 at 20:36