-1

I would like to use python to merge multiple JSON objects with duplicate or more fields with different data into one object by adding additional fields to the merged object. In this example. all of the fields have matching data except for the "audience type" field.

{
    "Person Username": "bob@company.com",
    "Person Status": "Active",
    "Person E-mail": "bob@company.com",
    "Person First Name": "Ann",
    "Person Last Name": "Smith",
    "Person Audience Type": "General Managers",
    "Person Organization Name": "mycompany",
    "Person Organization Number": "US3058",
},
{
    "Person Username": "bob@company.com",
    "Person Status": "Active",
    "Person E-mail": "bob@company.com",
    "Person First Name": "Ann",
    "Person Last Name": "Smith",
    "Person Audience Type": "Managers",
    "Person Organization Name": "mycompany",
    "Person Organization Number": "US3058",
   },
{
    "Person Username": "bob@company.com",
    "Person Status": "Active",
    "Person E-mail": "bob@company.com",
    "Person First Name": "Ann",
    "Person Last Name": "Smith",
    "Person Audience Type": "Owners",
    "Person Organization Name": "mycompany",
    "Person Organization Number": "US3058",
},

I would like this to look like:

{
    "Person Username": "bob@company.com",
    "Person Status": "Active",
    "Person E-mail": "bob@company.com",
    "Person First Name": "Ann",
    "Person Last Name": "Smith",
    "Person Audience Type1": "General Managers",
    "Person Audience Type2": "Managers",
    "Person Audience Type3": "Owners",
    "Person Organization Name": "mycompany",
    "Person Organization Number": "US3058",
},

My code:

import csv
import json

def csv_to_json(csvFilePath, jsonFilePath):
    jsonArray = []

    #read csv file
    with open(csvFilePath, encoding='utf-8') as csvf:
        #load csv file data using csv library's dictionary reader
        csvReader = csv.DictReader(csvf)

        #convert each csv row into python dict
        for row in csvReader:
            #add this python dict to json array
            jsonArray.append(row)

    #convert python jsonArray to JSON String and write to file
    with open(jsonFilePath, 'w', encoding='utf-8') as jsonf:
        jsonString = json.dumps(jsonArray, indent=4)
        jsonf.write(jsonString)

csvFilePath = r'data.csv'
jsonFilePath = r'data.json'
csv_to_json(csvFilePath, jsonFilePath)
martineau
  • 119,623
  • 25
  • 170
  • 301

2 Answers2

0

I assume that your data is in the list of dictionaries

data =[
    {
        "Person Username": "bob@company.com",
        "Person Status": "Active",
        "Person E-mail": "bob@company.com",
        "Person First Name": "Ann",
        "Person Last Name": "Smith",
        "Person Audience Type": "General Managers",
        "Person Organization Name": "mycompany",
        "Person Organization Number": "US3058",
    },
    {
        "Person Username": "bob@company.com",
        "Person Status": "Active",
        "Person E-mail": "bob@company.com",
        "Person First Name": "Ann",
        "Person Last Name": "Smith",
        "Person Audience Type": "Managers",
        "Person Organization Name": "mycompany",
        "Person Organization Number": "US3058",
       },
    {
        "Person Username": "bob@company.com",
        "Person Status": "Active",
        "Person E-mail": "bob@company.com",
        "Person First Name": "Ann",
        "Person Last Name": "Smith",
        "Person Audience Type": "Owners",
        "Person Organization Name": "mycompany",
        "Person Organization Number": "US3058",
    }
]

Then, I have applied the following logic and get your desired output

d1 = {}
i = 1
v_list = [d[k] for d in data for k,v in d.items()]
for d in data:
    for k,v in d.items():
        if v_list.count(d[k])>1:
            d1[k] = v
        else:
            d1[k+""+str(i)] = v
            i += 1

Output looks like:

{'Person Username': 'bob@company.com',
 'Person Status': 'Active',
 'Person E-mail': 'bob@company.com',
 'Person First Name': 'Ann',
 'Person Last Name': 'Smith',
 'Person Audience Type1': 'General Managers',
 'Person Organization Name': 'mycompany',
 'Person Organization Number': 'US3058',
 'Person Audience Type2': 'Managers',
 'Person Audience Type3': 'Owners'}
Dejene T.
  • 973
  • 8
  • 14
  • 1
    [A code-only answer is not high quality](https://meta.stackoverflow.com/questions/392712/explaining-entirely-code-based-answers). While this code may answer the question, which might provide some limited, immediate help. A [proper explanation](https://meta.stackexchange.com/q/114762) would greatly improve its long-term value by showing why this is a good solution to the problem and would make it more useful to future readers with other, similar questions. "I have applied the following logic" is not an explanation. – martineau Jul 16 '22 at 19:15
  • is there a way to make it work with multiple users (different username)? I tried that code and it works but only if I have 1 user. If I add additional users it returns only the last user in the list and it incorrectly applies the wrong audience type from the first user to the returned user (last user in the list) . – silentmyst Jul 17 '22 at 16:45
0

this algorithm solves exactly what you ask, hope this helps

from json import dump, dumps


def json_formatter(data_list:dict)->dict:
    data_with_list = {}
    #transforming every key in a list 
    for data in data_list:
        for key,value in data.items():
            #means that is a new key
            if key not in data_with_list.keys():
                data_with_list[key] = [value]
            else:
                #avoid pushing repeated itens
                if value not in data_with_list[key]:
                    data_with_list[key].append(value)

    #transforming list 
    formated = {}
    for key,value in data_with_list.items():
        #means its unique
        if len(value) == 1:
            formated[key] = value[0]
        else:
            for x in range(1,len(value)+1):
                formated[f'{key}{x}'] = value[x -1]

    return formated
   

data = [
        {
    "Person Username": "bob@company.com",
    "Person Status": "Active",
    "Person E-mail": "bob@company.com",
    "Person First Name": "Ann",
    "Person Last Name": "Smith",
    "Person Audience Type": "General Managers",
    "Person Organization Name": "mycompany",
    "Person Organization Number": "US3058",
    },
    {
    "Person Username": "bob@company.com",
    "Person Status": "Active",
    "Person E-mail": "bob@company.com",
    "Person First Name": "Ann",
    "Person Last Name": "Smith",
    "Person Audience Type": "Managers",
    "Person Organization Name": "mycompany",
    "Person Organization Number": "US3058",
   },
   {
        "Person Username": "bob@company.com",
        "Person Status": "Active",
        "Person E-mail": "bob@company.com",
        "Person First Name": "Ann",
        "Person Last Name": "Smith",
        "Person Audience Type": "Owners",
        "Person Organization Name": "mycompany",
        "Person Organization Number": "US3058",
    },
       {
        "Person Username": "bob@company.comsss",
        "Person Status": "Active",
        "Person E-mail": "bob@company.com",
        "Person First Name": "Ann",
        "Person Last Name": "Smith",
        "Person Audience Type": "Ownerswww",
        "Person Organization Name": "mycompany",
        "Person Organization Number": "US3058",
    }
]

r = json_formatter(data)
dump(r,open('teste.json','w'),indent=4)
martineau
  • 119,623
  • 25
  • 170
  • 301
  • While this code may answer the question, it would be better to include some context, explaining _how_ it works and _when_ to use it. Code-only answers are not useful in the long run. – martineau Jul 16 '22 at 19:16
  • Thank you for the code and adding the extra data. Using the added data brought up a quick question. What if I only needed specific fields to be combined such as audience type and a new dictionary would only be created for every different username. Only audience type would be combined and numbered. Would it be possible to make it so only the username can be different between the dictionaries and all other fields should be combined. – silentmyst Jul 17 '22 at 21:41