1

I’m trying to convert a flat structured CSV into a nested JSON structure. The CSV is generated from SQL which creates multiple rows for each primary id. The CSV is structured as follows:

PrimaryId,FirstName,LastName,City,CarName,DogName
100,John,Smith,NewYork,Toyota,Spike
100,John,Smith,NewYork,BMW,Spike
100,John,Smith,NewYork,Toyota,Rusty
100,John,Smith,NewYork,BMW,Rusty
101,Ben,Swan,Sydney,Volkswagen,Buddy
101,Ben,Swan,Sydney,Ford,Buddy
101,Ben,Swan,Sydney,Audi,Buddy
101,Ben,Swan,Sydney,Volkswagen,Max
101,Ben,Swan,Sydney,Ford,Max
101,Ben,Swan,Sydney,Audi,Max
102,Julia,Brown,London,Mini,Lucy

The desired JSON output is:

{
    "data": [
        {
            "City": "NewYork", 
            "FirstName": "John", 
            "PrimaryId": 100, 
            "LastName": "Smith", 
            "CarName": [
                "Toyota", 
                "BMW"
            ], 
            "DogName": [
                "Spike", 
                "Rusty"
            ]
        }, 
        {
            "City": "Sydney", 
            "FirstName": "Ben", 
            "PrimaryId": 101, 
            "LastName": "Swan", 
            "CarName": [
                "Volkswagen", 
                "Ford", 
                "Audi"
            ], 
            "DogName": [
                "Buddy", 
                "Max"
            ]
        }, 
        {
            "City": "London", 
            "FirstName": "Julia", 
            "PrimaryId": 102, 
            "LastName": "Brown", 
            "CarName": [
                "Mini"
            ], 
            "DogName": [
                "Lucy"
            ]
        }
    ]
}

Both this post and this one have helped but I'm yet to create the correct structure.

martineau
  • 119,623
  • 25
  • 170
  • 301
David
  • 103
  • 1
  • 1
  • 7

2 Answers2

10

Here's the general way of doing so with csv.DictReader.

Start by loading the data:

import csv
import itertools
with open('stuff.csv', 'rb') as csvfile:
    all_ = list(csv.DictReader(csvfile))

Now, you can use itertools.groupby to group and process each group. For example

d = []
for k, g in itertools.groupby(
        all_, 
        key=lambda r: (r['PrimaryId'], r[' LastName'])):
    d.append({
        'PrimaryId': k[0],
        'LastName': k[1],
        'CarName': [e[' CarName'] for e in g]
        })

Will group by primary id and last name, and make a list of cars.

Once you have something like this, you can just use json.dumps().

Ami Tavory
  • 74,578
  • 11
  • 141
  • 185
2

Your data, converted to valid csv is saved in data.csv:

PrimaryId,FirstName,LastName,City,CarName,DogName
100,John,Smith,NewYork,Toyota,Spike
100,John,Smith,NewYork,BMW,Spike
100,John,Smith,NewYork,Toyota,Rusty
100,John,Smith,NewYork,BMW,Rusty
101,Ben,Swan,Sydney,Volkswagen,Buddy
101,Ben,Swan,Sydney,Ford,Buddy
101,Ben,Swan,Sydney,Audi,Buddy
101,Ben,Swan,Sydney,Volkswagen,Max
101,Ben,Swan,Sydney,Ford,Max
101,Ben,Swan,Sydney,Audi,Max
102,Julia,Brown,London,Mini,Lucy

Using pandas to do the heavy lifting, and assuming this valid csv file, this is one way of doing what you want:

import json
import pandas as pd

df = pd.read_csv('data.csv')

def get_nested_rec(key, grp):
    rec = {}
    rec['PrimaryId'] = key[0]
    rec['FirstName'] = key[1]
    rec['LastName'] = key[2]
    rec['City'] = key[3]

    for field in ['CarName','DogName']:
        rec[field] = list(grp[field].unique())

    return rec

records = []
for key, grp in df.groupby(['PrimaryId','FirstName','LastName','City']):
    rec = get_nested_rec(key, grp)
    records.append(rec)

records = dict(data = records)

print(json.dumps(records, indent=4))

And the result:

{
    "data": [
        {
            "City": "NewYork", 
            "FirstName": "John", 
            "PrimaryId": 100, 
            "LastName": "Smith", 
            "CarName": [
                "Toyota", 
                "BMW"
            ], 
            "DogName": [
                "Spike", 
                "Rusty"
            ]
        }, 
        {
            "City": "Sydney", 
            "FirstName": "Ben", 
            "PrimaryId": 101, 
            "LastName": "Swan", 
            "CarName": [
                "Volkswagen", 
                "Ford", 
                "Audi"
            ], 
            "DogName": [
                "Buddy", 
                "Max"
            ]
        }, 
        {
            "City": "London", 
            "FirstName": "Julia", 
            "PrimaryId": 102, 
            "LastName": "Brown", 
            "CarName": [
                "Mini"
            ], 
            "DogName": [
                "Lucy"
            ]
        }
    ]
}
daedalus
  • 10,873
  • 5
  • 50
  • 71