Easiest way to split JSON file using Python

Question

I am working on an interactive visualization of the world happiness report from the years 2015 up to 2020. The data was split into 6 csv files. Using pandas, I have succesfully cleaned the data and concatenated them into one big JSON file with the following format:

[
  {
    "Country": "Switzerland",
    "Year": 2015,
    "Happiness Rank": 1,
    "Happiness Score": 7.587000000000001,
  },
  {
    "Country": "Iceland",
    "Year": 2015,
    "Happiness Rank": 2,
    "Happiness Score": 7.561,
  },
  {
    "Country": "Switzerland",
    "Year": 2016,
    "Happiness Rank": 2,
    "Happiness Score": 7.5089999999999995,
  },
  {
    "Country": "Iceland",
    "Year": 2016,
    "Happiness Rank": 3,
    "Happiness Score": 7.501,
  },
  {
    "Country": "Switzerland",
    "Year": 2017,
    "Happiness Rank": 3,
    "Happiness Score": 7.49399995803833,
  },
  {
    "Country": "Iceland",
    "Year": 2017,
    "Happiness Rank": 1,
    "Happiness Score": 7.801,
  }
]

Now, I would like to programmatically format the JSON file such that it has the following format:

{
    "2015": {
        "Switzerland": {
            "Happiness Rank": 1,
            "Happiness Score": 7.587000000000001
        },
        "Iceland": {
            "Happiness Rank": 2,
            "Happiness Score": 7.561
        }
    },
    "2016": {
        "Switzerland": {
            "Happiness Rank": 2,
            "Happiness Score": 7.5089999999999995
        },
        "Iceland": {
            "Happiness Rank": 3,
            "Happiness Score": 7.501
        }
    },
    "2017": {
        "Switzerland": {
            "Happiness Rank": 3,
            "Happiness Score": 7.49399995803833
        },
        "Iceland": {
            "Happiness Rank": 1,
            "Happiness Score": 7.801
        }
    }
}

It has to be done programmatically, since there are over 900 distinct (country, year) pairs. I want the JSON in this format since it make the JSON file more readable, and makes it easier to select appropriate data. If I want the rank of Iceland in 2015, I can then do data[2015]["Iceland"]["Happiness Rank"]

Does anyone know the easiest / most convenient way to do this in Python?

score 1 · Accepted Answer · answered Jan 14 '21 at 16:12

If data is your original list of dictionaries:

def by_year(data):
    from itertools import groupby
    from operator import itemgetter

    retain_keys = ("Happiness Rank", "Happiness Score")

    for year, group in groupby(data, key=itemgetter("Year")):
        as_tpl = tuple(group)
        yield str(year), dict(zip(map(itemgetter("Country"), as_tpl), [{k: d[k] for k in retain_keys} for d in as_tpl]))


print(dict(by_year(data)))

Output:

{'2015': {'Switzerland': {'Happiness Rank': 1, 'Happiness Score': 7.587000000000001}, 'Iceland': {'Happiness Rank': 2, 'Happiness Score': 7.561}}, '2016': {'Switzerland': {'Happiness Rank': 2, 'Happiness Score': 7.5089999999999995}, 'Iceland': {'Happiness Rank': 3, 'Happiness Score': 7.501}}, '2017': {'Switzerland': {'Happiness Rank': 3, 'Happiness Score': 7.49399995803833}, 'Iceland': {'Happiness Rank': 1, 'Happiness Score': 7.801}}}
>>>

This assumes that the dictionaries in data will already be grouped together by year.

score 0 · Answer 2 · edited Jan 14 '21 at 14:36

0

I assume you have the original pandas dataframe from which this JSON was created. With pandas, you can do df = df.groupby(['Year', 'Country']). You can then follow the procedure in pandas groupby to nested json to convert it to JSON.

edited Jan 14 '21 at 14:36

Dharman

30,962
25
85
135

answered Jan 14 '21 at 14:30

Akash Haridas

84
3

score 0 · Answer 3 · answered Jan 14 '21 at 14:44

you might find groupby from the itertools module useful. I was able to do this with

import itertools
groups = itertools.groupby(data, lambda x: x["Year"])
newdict = {str(year): {entry["Country"]:entry for entry in group} for year, group in groups}

Where data is the data with the form of the example you gave

It will retain the original fields in the dict, but it can easily be deleted in this way

for countries in newdict.values():
    for c in countries.values():
        del c["Year"]
        del c["Country"]

Easiest way to split JSON file using Python

3 Answers3