2

Let's say I've got some results like the below from iterating thru JSON file.

{257585701: [156173119], 667512043: [228087519], 821360748: [5350676] and more }
{136607969: [13510118], 667512043: [13510118], 257585701: [13510118] and more } 
{....................more data..........} 
{....................more data..........} 
like 100s 

Now, if I wanna delete the duplicate value and append the value (from deleted duplicate value) to the original key, how can I do that? I'm hoping to get something like this:

{257585701: [156173119,13510118 ], 667512043: [228087519, 13510118], 821360748: [5350676], 136607969: [13510118]} 

My codes are:

import json

filepath = '../data/' # I have subdirectories and tons of json file 

with open(filepath) as stream:
    data = json.load(stream)

    results = {}

    for item in data['info']['items']:
        cid = item['id']
        for trainer in item['trainer']:
            tid = trainer['id']
            if tid not in trainers:
                trainers[tid] = []
            trainers[tid].append(cid)

    print(results) 
    
    # this print(results) prints the dictionary I mentioned above and they're like 100s of them. 
    
Tech
  • 65
  • 7

4 Answers4

1

This iterates through all the keys in dict2 and if it is already present it appends the value, otherwise it adds a new key:

dict1 = {257585701: [156173119], 667512043: [228087519], 821360748: [5350676]}
dict2 = {136607969: [13510118], 667512043: [13510118], 257585701: [13510118]}

dict3 = dict1

for k, v in dict2.items():
    if k in dict3.keys():
        dict3[k] += v
    else:
        dict3[k] = v

print(dict3)

Output:

{257585701: [156173119, 13510118], 667512043: [228087519, 13510118], 821360748: [5350676], 136607969: [13510118]}
The Thonnu
  • 3,578
  • 2
  • 8
  • 30
  • I can't really have name for those dictionaries as I'm getting the result from print() from like 100s json file. for example, print(results) prints those dictionaries – Tech Jun 05 '22 at 15:06
1

You can start here

def merge_dicts(*dicts):
    d = {}
    for dict in dicts:
        for key in dict:
            try:
                d[key].append(dict[key])
            except KeyError:
                d[key] = [dict[key]]
    return d

pass all dicts in merge_dicts(d1,d2,d3..)

ahmedshahriar
  • 1,053
  • 7
  • 25
  • I can't merge as these are result from print() from iterating thru 100s JSON sile. for example, print(results) prints those dictionaries – Tech Jun 05 '22 at 15:08
1

You can try to ingest the data string into a list of dictionary and process from there.

I'm using dic.get(key, '') instead of dic['key'] for the same purpose, but without the key error if the key does not exist. When the key does not exist, it outputs the empty string '' specified.

data = """{257585701: [156173119], 667512043: [228087519], 821360748: [5350676]}
{136607969: [13510118], 667512043: [13510118], 257585701: [13510118]}
{136607969: [135101], 667512043: [135101], 257585701: [135101]}"""

#dict_list = [eval(e) for e in data.split('\n')]    #NOT safe, do NOT use this!
import ast
dict_list = [ast.literal_eval(e) for e in data.split('\n')]    #use this

Output dict_list

[{257585701: [156173119], 667512043: [228087519], 821360748: [5350676]},
 {136607969: [13510118], 667512043: [13510118], 257585701: [13510118]},
 {136607969: [135101], 667512043: [135101], 257585701: [135101]}]

I'm assuming data is from print results, and they are separated by new line \n, so they can be processed into Python dict above.

keys = []
result = {}

for dic in dict_list:
    keys.extend(dic.keys())
keys = set(keys)

for key in keys:
    result[key] = []
    for dic in dict_list:
        result[key] += dic.get(key, '')

print(result)

Output:

{136607969: [13510118, 135101],
 667512043: [228087519, 13510118, 135101],
 821360748: [5350676],
 257585701: [156173119, 13510118, 135101]}
blackraven
  • 5,284
  • 7
  • 19
  • 45
1

Write functions.

I can't test the code fully because I don't have access to your input. I also had to guess the type of trainers. The following code hopefully approaches a solution.

from collections import defaultdict
import json


def read_one_json(filepath: str, trainers: [dict]) -> dict:
    with open(filepath) as stream:
        data = json.load(stream)

        results = {}

        for item in data['info']['items']:
            cid = item['id']
            for trainer in item['trainer']:
                tid = trainer['id']
                if tid not in trainers:
                    trainers[tid] = []
                trainers[tid].append(cid)
    return results


def read_jsons(filepaths: [str], trainers: [dict]) -> list[dict]:
    jsons = []
    for filepath in filepaths:
        jsons.append(read_one_json(filepath, trainers))
    return jsons


def combine_dicts(dicts: [dict]) -> dict:
    """
    dicts is list of dicts of (int, [int]) pairs.
    combine_dicts returns a new dict where the values of duplicate keys are combined

    >>> dicts = [{257585701: [156173119], 667512043: [228087519], 821360748: [5350676]}]
    >>> dicts += [{136607969: [13510118], 667512043: [13510118], 257585701: [13510118]}]
    >>> combine_dicts(dicts)
    defaultdict(<class 'list'>, {257585701: [156173119, 13510118], 667512043: [228087519, 13510118], 821360748: [5350676], 136607969: [13510118]})

    """
    combined_data = defaultdict(list)

    for data in dicts:
        for key, value in data.items():
            combined_data[key] += value

    return combined_data


def main() -> None:
    filepaths = [...]  # you supply these
    trainers = [...]  # you supply these
    separate_data = read_jsons(filepaths, trainers)
    combined_data = combine_dicts(separate_data)
    print(combined_data)


if __name__ == '__main__':
    main()

fanjie
  • 196
  • 5