Removing duplicate key and appending the value of the deleted key in Python

Question

Let's say I've got some results like the below from iterating thru JSON file.

{257585701: [156173119], 667512043: [228087519], 821360748: [5350676] and more }
{136607969: [13510118], 667512043: [13510118], 257585701: [13510118] and more } 
{....................more data..........} 
{....................more data..........} 
like 100s

Now, if I wanna delete the duplicate value and append the value (from deleted duplicate value) to the original key, how can I do that? I'm hoping to get something like this:

{257585701: [156173119,13510118 ], 667512043: [228087519, 13510118], 821360748: [5350676], 136607969: [13510118]}

My codes are:

import json

filepath = '../data/' # I have subdirectories and tons of json file 

with open(filepath) as stream:
    data = json.load(stream)

    results = {}

    for item in data['info']['items']:
        cid = item['id']
        for trainer in item['trainer']:
            tid = trainer['id']
            if tid not in trainers:
                trainers[tid] = []
            trainers[tid].append(cid)

    print(results) 
    
    # this print(results) prints the dictionary I mentioned above and they're like 100s of them.

you have many dictionaries and they have common keys, you want to merge them all into one dict and the values of the common keys will be appended, right? — ahmedshahriar, Jun 05 '22 at 14:56
I've edited the question for better understanding. Any help is appreciated — Tech, Jun 05 '22 at 14:58
Does this answer your question? [Merge two dictionaries and keep the values for duplicate keys in Python](https://stackoverflow.com/questions/52562882/merge-two-dictionaries-and-keep-the-values-for-duplicate-keys-in-python) — HerrAlvé, Jun 05 '22 at 15:08
@ AlveMonke. You're correct when we have the dictionary assigned in a variable but mine is some printed dictionaries from 100s JSON file. like print(results) after iterating gives me the result — Tech, Jun 05 '22 at 15:16
@Tech now we're cooking. Append all of those `results` dictionaries to a list called, say, `separate_data`, and then take another look at my answer. — fanjie, Jun 05 '22 at 15:51

score 1 · Answer 1 · answered Jun 05 '22 at 14:58

This iterates through all the keys in dict2 and if it is already present it appends the value, otherwise it adds a new key:

dict1 = {257585701: [156173119], 667512043: [228087519], 821360748: [5350676]}
dict2 = {136607969: [13510118], 667512043: [13510118], 257585701: [13510118]}

dict3 = dict1

for k, v in dict2.items():
    if k in dict3.keys():
        dict3[k] += v
    else:
        dict3[k] = v

print(dict3)

Output:

{257585701: [156173119, 13510118], 667512043: [228087519, 13510118], 821360748: [5350676], 136607969: [13510118]}

I can't really have name for those dictionaries as I'm getting the result from print() from like 100s json file. for example, print(results) prints those dictionaries — Tech, Jun 05 '22 at 15:06

score 1 · Answer 2 · answered Jun 05 '22 at 15:01

1

You can start here

def merge_dicts(*dicts):
    d = {}
    for dict in dicts:
        for key in dict:
            try:
                d[key].append(dict[key])
            except KeyError:
                d[key] = [dict[key]]
    return d

pass all dicts in merge_dicts(d1,d2,d3..)

answered Jun 05 '22 at 15:01

ahmedshahriar

1,053
7
25

I can't merge as these are result from print() from iterating thru 100s JSON sile. for example, print(results) prints those dictionaries – Tech Jun 05 '22 at 15:08

blackraven · Answer 3 · 2022-08-17T09:15:19.733

You can try to ingest the data string into a list of dictionary and process from there.

I'm using dic.get(key, '') instead of dic['key'] for the same purpose, but without the key error if the key does not exist. When the key does not exist, it outputs the empty string '' specified.

data = """{257585701: [156173119], 667512043: [228087519], 821360748: [5350676]}
{136607969: [13510118], 667512043: [13510118], 257585701: [13510118]}
{136607969: [135101], 667512043: [135101], 257585701: [135101]}"""

#dict_list = [eval(e) for e in data.split('\n')]    #NOT safe, do NOT use this!
import ast
dict_list = [ast.literal_eval(e) for e in data.split('\n')]    #use this

Output dict_list

[{257585701: [156173119], 667512043: [228087519], 821360748: [5350676]},
 {136607969: [13510118], 667512043: [13510118], 257585701: [13510118]},
 {136607969: [135101], 667512043: [135101], 257585701: [135101]}]

I'm assuming data is from print results, and they are separated by new line \n, so they can be processed into Python dict above.

keys = []
result = {}

for dic in dict_list:
    keys.extend(dic.keys())
keys = set(keys)

for key in keys:
    result[key] = []
    for dic in dict_list:
        result[key] += dic.get(key, '')

print(result)

Output:

{136607969: [13510118, 135101],
 667512043: [228087519, 13510118, 135101],
 821360748: [5350676],
 257585701: [156173119, 13510118, 135101]}

@ Black Raven, I get the following error 'dict' object has no attribute 'split' — Tech, Jun 05 '22 at 15:36
you mentioned your data is from print results, and they are separated by new line `\n` correct? You could paste them between the doc-strings `"""your printout here"""` and process from there — blackraven, Jun 05 '22 at 15:41
can you try this? `dict_list = [d for d in results]` or `dict_list = [results]` — blackraven, Jun 06 '22 at 12:24

score 1 · Answer 4 · answered Jun 05 '22 at 16:17

Write functions.

I can't test the code fully because I don't have access to your input. I also had to guess the type of trainers. The following code hopefully approaches a solution.

from collections import defaultdict
import json


def read_one_json(filepath: str, trainers: [dict]) -> dict:
    with open(filepath) as stream:
        data = json.load(stream)

        results = {}

        for item in data['info']['items']:
            cid = item['id']
            for trainer in item['trainer']:
                tid = trainer['id']
                if tid not in trainers:
                    trainers[tid] = []
                trainers[tid].append(cid)
    return results


def read_jsons(filepaths: [str], trainers: [dict]) -> list[dict]:
    jsons = []
    for filepath in filepaths:
        jsons.append(read_one_json(filepath, trainers))
    return jsons


def combine_dicts(dicts: [dict]) -> dict:
    """
    dicts is list of dicts of (int, [int]) pairs.
    combine_dicts returns a new dict where the values of duplicate keys are combined

    >>> dicts = [{257585701: [156173119], 667512043: [228087519], 821360748: [5350676]}]
    >>> dicts += [{136607969: [13510118], 667512043: [13510118], 257585701: [13510118]}]
    >>> combine_dicts(dicts)
    defaultdict(<class 'list'>, {257585701: [156173119, 13510118], 667512043: [228087519, 13510118], 821360748: [5350676], 136607969: [13510118]})

    """
    combined_data = defaultdict(list)

    for data in dicts:
        for key, value in data.items():
            combined_data[key] += value

    return combined_data


def main() -> None:
    filepaths = [...]  # you supply these
    trainers = [...]  # you supply these
    separate_data = read_jsons(filepaths, trainers)
    combined_data = combine_dicts(separate_data)
    print(combined_data)


if __name__ == '__main__':
    main()

Removing duplicate key and appending the value of the deleted key in Python

4 Answers4