0

My code gets a json path file, open/parses it and prints out desired values with help of a csv mapping file set up (knows what key words to look for and what name to print values out as).

Some json files, however, have multiple values for example, a json file with key "Affiliate" will have more key/value pairs inside of it instead of just having a value.

How can I parse within a key like this one and print out the 'true' value vs the 'false' ones? Currently my code would print out the entire array of key value pairs within that target key.

Example json:

"Affiliate": [
    {
        "ov": true,
        "value": "United States",
        "lookupCode": "US"
    },
    {
        "ov": false,
        "value": "France",
        "lookupCode": "FR"
    }
]

My code:

import json
import csv

output_dict = {}

#maps csv and json information 
def findValue(json_obj, target_key, output_key):
    for key in json_obj:
        if isinstance(json_obj[key], dict):
            findValue(json_obj[key], target_key, output_key)
        else:
            if target_key == key:
                output_dict[output_key] = json_obj[key]

#Opens and parses json file
file = open('source_data.json', 'r')
json_read = file.read()
obj = json.loads(json_read)

#Opens and parses csv file (mapping)
with open('inputoutput.csv') as csvfile:
    fr = csv.reader(csvfile)
    for row in fr:
        findValue(obj, row[0], row[1])

#creates/writes into json file 
with open("output.json", "w") as out: 
    json.dump(output_dict, out, indent=4)
mkrieger1
  • 19,194
  • 5
  • 54
  • 65
  • can you add the relevant line for the `inputoutput.csv` to your question? Also, you're not closing `source_data.json`. I advise you to use the `with open` pattern there as well... – Edo Akse Jul 02 '21 at 12:44
  • @Edo Akse the csv file will contain lines only like this "LastModifiedDate,date_modified" where the first input is the target key and the second is the output key – Jackson Jul 02 '21 at 16:26
  • I'm not exactly sure what the endresult is supposed to be ATM. You don't want to print out the whole value for the key `Affiliate` but how would you determine exactly which part of the list of values to output? – Edo Akse Jul 03 '21 at 09:39
  • yea so basically the csv file will tell the program what key words too look for, so within cases like "affiliate", i guess i would have to alter the program so that it checks if the value for the key word 'ov' is true, and if it is then it returns the value/payload of the key 'value' that corresponds with the true ov. And just some final context, the program creates and puts the values (that we searched for in json), with their corresponding output words (that we are giving them) so csv file will be Affiliate,Cntr and json file that gets created would look something like "Cntr" {United States} – Jackson Jul 04 '21 at 18:31

1 Answers1

0

So you'll need to change the way that the mapping CSV is structured, as you'll need variables to determine which criteria to meet, and which value to return when the criteria is met...

Please note that with the logic implemented below, if there are 2 list items in Affiliate that both have the key ov set to true that only the last one will be added (dict keys are unique). You could put a return where I commented in the code, but then it would only use the first one of course.

I've restructured the CSV as below:

inputoutput.csv

Affiliate,Cntr,ov,true,value
Sample1,Output1,,,
Sample2,Output2,criteria2,true,returnvalue

The JSON I used as the source data is this one:

source_data.json

{
    "Affiliate": [
        {
            "ov": true,
            "value": "United States",
            "lookupCode": "US"
        },
        {
            "ov": false,
            "value": "France",
            "lookupCode": "FR"
        }
    ],
    "Sample1": "im a value",
    "Sample2": [
        {
            "criteria2": false,
            "returnvalue": "i am not a return value"
        },
        {
            "criteria2": true,
            "returnvalue": "i am a return value"
        }
    ]
}

The actual code is below, note that I commented a bit on my choices.

main.py

import json
import csv


output_dict = {}


def str2bool(input: str) -> bool:
    """simple check to see if a str is a bool"""
    # shamelessly stolen from:
    # https://stackoverflow.com/a/715468/9267296
    return input.lower() in ("yes", "true", "t", "1")


def findValue(
    json_obj,
    target_key,
    output_key,
    criteria_key=None,
    criteria_value=False,
    return_key="",
):
    """maps csv and json information"""
    # ^^ use PEP standard for docstrings:
    # https://www.python.org/dev/peps/pep-0257/#id16

    # you need to global the output_dict to avoid weirdness
    # see https://www.w3schools.com/python/gloss_python_global_scope.asp
    global output_dict

    for key in json_obj:
        if isinstance(json_obj[key], dict):
            findValue(json_obj[key], target_key, output_key)

        # in this case I advise to use "elif" instead of the "else: if..."
        elif target_key == key:
            # so this is the actual logic change.
            if isinstance(json_obj[key], list):
                for item in json_obj[key]:
                    if (
                        criteria_key != None
                        and criteria_key in item
                        and item[criteria_key] == criteria_value
                    ):
                        output_dict[output_key] = item[return_key]
                        # here you could put a return
            else:
                # this part doesn't change
                output_dict[output_key] = json_obj[key]
                # since we found the key and added in the output_dict
                # you can return here to slightly speed up the total
                # execution time
                return


# Opens and parses json file
with open("source_data.json") as sourcefile:
    json_obj = json.load(sourcefile)


# Opens and parses csv file (mapping)
with open("inputoutput.csv") as csvfile:
    fr = csv.reader(csvfile)
    for row in fr:
        # this check is to determine if you need to add criteria
        # row[2] would be the key to check
        # row[3] would be the value that the key need to have
        # row[4] would be the key for which to return the value
        if row[2] != "":
            findValue(json_obj, row[0], row[1], row[2], str2bool(row[3]), row[4])
        else:
            findValue(json_obj, row[0], row[1])


# Creates/writes into json file
with open("output.json", "w") as out:
    json.dump(output_dict, out, indent=4)

running the above code with the input files I provided, results in the following file:

output.json

{
    "Cntr": "United States",
    "Output1": "im a value",
    "Output2": "i am a return value"
}

I know there are ways to optimize this, but I wanted to keep it close to the original. You might need to play with the exact way you add stuff to output_dict to get the exact output JSON you want...

Edo Akse
  • 4,051
  • 2
  • 10
  • 21