I am a newbie to json
and I tried what has been proposed here. But I failed.
My original file (abbreviated) is called test.csv
and looks like this:
person_uuid sample_uuid sample_slot sample_info
aa AB A anything
aa BD B more info
bc FD A just info
bc AD B even more info
bc OI C text
hu KL B texttext
hu HF C information
The script I try to convert it with is called csv2json.py
:
import csv
import json
import sys
base_name = sys.argv[1]
csvFilePath = "data/"+base_name+".csv"
jsonFilePath = "data/"+base_name+".json"
# https://stackoverflow.com/a/53474378/8584652
primary_fields = ['person_uuid']
secondary_fields = ['sample_slot']
result = []
with open(csvFilePath) as csv_file:
reader = csv.DictReader(csv_file, delimiter='\t', skipinitialspace=True)
for row in reader:
d = {k: v for k, v in row.items() if k in primary_fields}
e = {k: v for k, v in row.items() if k in secondary_fields}
d['samples'] = [{k: v, }
for k, v in row.items() if k not in primary_fields]
result.append(d)
# convert python jsonArray to JSON String and write to file
with open(jsonFilePath, 'w', encoding='utf-8') as jsonf:
jsonString = json.dumps(result, indent=4)
jsonf.write(jsonString)
I envoke the conversion with python csv2json.py test
and I get this as result:
[
{
"person_uuid": "aa",
"samples": [
{
"sample_uuid": "AB"
},
{
"sample_slot": "A"
},
{
"sample_info": "anything"
}
]
},
{
"person_uuid": "aa",
"samples": [
{
"sample_uuid": "BD"
},
{
"sample_slot": "B"
},
{
"sample_info": "more info"
}
]
},
{
"person_uuid": "bc",
"samples": [
{
"sample_uuid": "FD"
},
{
"sample_slot": "A"
},
{
"sample_info": "just info "
}
]
},
{
"person_uuid": "bc",
"samples": [
{
"sample_uuid": "AD"
},
{
"sample_slot": "B"
},
{
"sample_info": "even more info "
}
]
},
{
"person_uuid": "bc",
"samples": [
{
"sample_uuid": "OI"
},
{
"sample_slot": "C"
},
{
"sample_info": "text"
}
]
},
{
"person_uuid": "hu",
"samples": [
{
"sample_uuid": "KL"
},
{
"sample_slot": "B"
},
{
"sample_info": "texttext"
}
]
},
{
"person_uuid": "hu",
"samples": [
{
"sample_uuid": "HF"
},
{
"sample_slot": "C"
},
{
"sample_info": "information"
}
]
}
]
But I would like to get instead:
[
{
"person_uuid": "aa",
"samples": {
"A": {
"sample_uuid": "AB",
"sample_info": "anything"
},
"B": {
"sample_uuid": "BD",
"sample_info": "more info"
}
}
}, {
"person_uuid": "bc",
"samples": {
"A": {
"sample_uuid": "FD",
"sample_info": "just info"
},
"B": {
"sample_uuid": "AD",
"sample_info": "even more info"
},
"C": {
"sample_uuid": "OI",
"sample_info": "text"
}
}
},
{
"person_uuid": "hu",
"samples": {
"B": {
"sample_uuid": "KL",
"sample_info": "texttext"
},
"C": {
"sample_uuid": "HF",
"sample_info": "information"
}
}
}
]
Any help appreciated how I can nest properly (what I tried with e = {k: v for k, v in row.items() if k in secondary_fields}
).