-1

A sample of the JSON is as shown below:

{
  "AN": {
    "dates": {
      "2020-03-26": {
        "delta": {
          "confirmed": 1
        },
        "total": {
          "confirmed": 1
        }
      }
     }
    },
    
    "KA": {
        "dates": {
          "2020-03-09": {
            "delta": {
                "confirmed": 1
        },
        "total": {
          "confirmed": 1
        }
      },
      "2020-03-10": {
        "delta": {
          "confirmed": 3
        },
        "total": {
          "confirmed": 4
        }
      }
     }
    }
}

I would like to load it into a DataFrame, such that the state names (AN, KA) are represented as Row names, and the dates and nested entries are present as Columns.

Any tips to achieve this would be very much appreciated. [I am aware of json_normalize, however I haven't figured out how to work it out yet.]

The output I am expecting, is roughly as shown below:

DataFrame Expected

Hazim Ahmed
  • 105
  • 2
  • 8

2 Answers2

2

Can you update your post with the DataFrame you have in mind ? It'll be easier to understand what you want.

Also sometimes it's better to reshape your data if you can't make it work the way they are now.

Update:

Following your update here's what you can do.

  • You need to reshape your data, as I said when you can't achieve what you want it is best to look at the problem from another point of view. For instance (and from the sample you shared) the 'dates' keys is meaningless as the other keys are already dates and there are no other keys ate the same level.
  • A way to achieve what you want would be to use MultiIndex, it'll help you group your data the way you want. To use it you can for instance create all the indices you need and store in a dictionary the values associated.

Example :

If the only index you have is ('2020-03-26', 'delta', 'confirmed') you should have values = {'AN' : [1], 'KA':None}

Then you only need to create your DataFrame and transpose it.

I gave it a quick try and came up with a piece of code that should work. If you're looking for performance I don't think this will do the trick.

import pandas as pd

# d is the sample you shared

index = [[],[],[]]
values = {}
# Get all the dates
dates = [date for c in d.keys() for date in d[c]['dates'].keys() ]

for country in d.keys():
    # For each country we create an array containing all 6 values for each date
    # (missing values as None)
    values[country] = []
    for date in dates:
        if date in d[country]['dates']:
            for method in ['delta', 'total']:
                for step in ['confirmed', 'recovered', 'tested']:
                    # Incrementing indices 
                    index[0].append(date)
                    index[1].append(method) 
                    index[2].append(step)
                    if step in value.keys():
                        values[country].append(deepcopy(d[country]['dates'][date][method][step]))
                    else : 
                        values[country].append(None)
        # When country does not have a date fill with None
        else : 
            for method in ['delta', 'total']:
                for step in ['confirmed', 'recovered', 'tested']:
                    index[0].append(date)
                    index[1].append(method) 
                    index[2].append(step)
                    values[country].append(None)

# Removing duplicates introduced because we added n_countries times
# the indices 
# 3 is the number of steps
# 2 is the number of methods

number_of_rows = 3*2*len(dates)
index[0] = index[0][:number_of_rows]
index[1] = index[1][:number_of_rows]
index[2] = index[2][:number_of_rows]

df = pd.DataFrame(values, index=index).T

Here is what I have for the transposed data frame of my output :

results

Hope this can help you

Luka Barisic
  • 258
  • 3
  • 10
1

You clearly needs to reshape your json data before load it into a DataFrame. Have you tried load your json like a dict ?

dataframe = pd.DataFrame.from_dict(JsonDict, orient="index")

The “orient” of the data. If the keys of the passed dict should be the columns of the resulting DataFrame, pass ‘columns’ (default). Otherwise if the keys should be rows, pass ‘index’.

João Santos
  • 194
  • 11
  • So I gave this a shot, the output is as follows: dates AN {'2020-03-26': {'delta': {'confirmed': 1}, 'to... AP {'2020-03-12': {'delta': {'confirmed': 1}, 'to... AR {'2020-04-02': {'delta': {'confirmed': 1}, 'to... Is there any way to reshape the columns so that I can store entries correctly? Right now, the state names are correctly present as rows, but there is a single column called dates with all of the entries present in it (plus the formatting isn't correct, it is stored as dict key and values) – Hazim Ahmed Sep 23 '20 at 02:20
  • 1
    Take a look here -> https://stackoverflow.com/questions/24988131/nested-dictionary-to-multiindex-dataframe-where-dictionary-keys-are-column-label – João Santos Sep 23 '20 at 13:30