0

I am trying to read a JSON dataset (see below a part of it). I want to use it in a flattened Pandas DataFrame to have access to all columns, in particular "A" and "B "with some data as columns for further processing.

import pandas as pd

datajson= {
    "10001": {
      "extra": {"user": "Tom"},
      "data":{"A":5, "B":10}
    },
    "10002":{
      "extra": {"user": "Ben"},
      "data":{"A":7, "B":20}
    },
    "10003":{
      "extra": {"user": "Ben"},
      "data":{"A":6, "B":15}
    }
  }
df = pd.read_json(datajson, orient='index')
# same with DataFrame.from_dict
# df2 = pd.DataFrame.from_dict(datajson, orient='index')

which results in Dataframe.

I am assuming there is a simple way without looping/appending and making a complicated and slow decoder but rather using for example Panda's json_normalize().

1 Answers1

0

I don't think you will be able to do that without looping through the json. You can do that relatively efficiently though if you make use of a list comprehension:

def parse_inner_dictionary(data):
    return pd.concat([pd.DataFrame(i, index=[0]) for i in data.values()], axis=1)


df = pd.concat([parse_inner_dictionary(v) for v in datajson.values()])
df.index = datajson.keys()
print(df)

      user  A   B
10001  Tom  5  10
10002  Ben  7  20
10003  Ben  6  15
Ben Jeffrey
  • 714
  • 9
  • 18