So you'll need to change the way that the mapping CSV is structured, as you'll need variables to determine which criteria to meet, and which value to return when the criteria is met...
Please note that with the logic implemented below, if there are 2 list items in Affiliate
that both have the key ov
set to true
that only the last one will be added (dict keys are unique). You could put a return
where I commented in the code, but then it would only use the first one of course.
I've restructured the CSV as below:
inputoutput.csv
Affiliate,Cntr,ov,true,value
Sample1,Output1,,,
Sample2,Output2,criteria2,true,returnvalue
The JSON I used as the source data is this one:
source_data.json
{
"Affiliate": [
{
"ov": true,
"value": "United States",
"lookupCode": "US"
},
{
"ov": false,
"value": "France",
"lookupCode": "FR"
}
],
"Sample1": "im a value",
"Sample2": [
{
"criteria2": false,
"returnvalue": "i am not a return value"
},
{
"criteria2": true,
"returnvalue": "i am a return value"
}
]
}
The actual code is below, note that I commented a bit on my choices.
main.py
import json
import csv
output_dict = {}
def str2bool(input: str) -> bool:
"""simple check to see if a str is a bool"""
# shamelessly stolen from:
# https://stackoverflow.com/a/715468/9267296
return input.lower() in ("yes", "true", "t", "1")
def findValue(
json_obj,
target_key,
output_key,
criteria_key=None,
criteria_value=False,
return_key="",
):
"""maps csv and json information"""
# ^^ use PEP standard for docstrings:
# https://www.python.org/dev/peps/pep-0257/#id16
# you need to global the output_dict to avoid weirdness
# see https://www.w3schools.com/python/gloss_python_global_scope.asp
global output_dict
for key in json_obj:
if isinstance(json_obj[key], dict):
findValue(json_obj[key], target_key, output_key)
# in this case I advise to use "elif" instead of the "else: if..."
elif target_key == key:
# so this is the actual logic change.
if isinstance(json_obj[key], list):
for item in json_obj[key]:
if (
criteria_key != None
and criteria_key in item
and item[criteria_key] == criteria_value
):
output_dict[output_key] = item[return_key]
# here you could put a return
else:
# this part doesn't change
output_dict[output_key] = json_obj[key]
# since we found the key and added in the output_dict
# you can return here to slightly speed up the total
# execution time
return
# Opens and parses json file
with open("source_data.json") as sourcefile:
json_obj = json.load(sourcefile)
# Opens and parses csv file (mapping)
with open("inputoutput.csv") as csvfile:
fr = csv.reader(csvfile)
for row in fr:
# this check is to determine if you need to add criteria
# row[2] would be the key to check
# row[3] would be the value that the key need to have
# row[4] would be the key for which to return the value
if row[2] != "":
findValue(json_obj, row[0], row[1], row[2], str2bool(row[3]), row[4])
else:
findValue(json_obj, row[0], row[1])
# Creates/writes into json file
with open("output.json", "w") as out:
json.dump(output_dict, out, indent=4)
running the above code with the input files I provided, results in the following file:
output.json
{
"Cntr": "United States",
"Output1": "im a value",
"Output2": "i am a return value"
}
I know there are ways to optimize this, but I wanted to keep it close to the original. You might need to play with the exact way you add stuff to output_dict
to get the exact output JSON you want...