0

I am importing a json file into a python3 jupyter notebook. The json file has the format

  1. object
    • rooms [26 elements]
      • 0
        • turns
          • fromBathroom
          • fromParking
        • distances
          • dfromBathroom
          • dfromParking
        • depth
        • area
      • 1
        • .... etc.
    • name

I am importing the json file in this way:

import pandas as pd
import numpy as np
import json
from pandas.io.json import json_normalize

with open("rooms.json") as file:
  data = json.load(file)
df = json_normalize(data['rooms'])

I am now trying to plot each of the 6 dimensions against each other in a matrix-like format, with 36 total graphs.

I am trying to this the following way:

col_features = ['fromBathroom', 'fromParking', 'dfromBathroom', 'dfromParking', 'depth', 'area']
pd.plotting.scatter_matrix(df[col_features], alpha = .2, figsize = (14,8))

This does not work, as I am getting an error that reads: KeyError: "['fromBathroom' 'fromParking' 'dfromBathroom' 'dfromParking'] not in index"

This is because those features are nested in 'turns' and 'distances' in the json file. Is there a way to un-nest these features so that I can index into the dataframe the same way I can for depth and area to get the values?

Thank you for any insights.

Thérèse Mills
  • 47
  • 1
  • 1
  • 4

1 Answers1

0

Maybe you could extract df1 = df['turns'], df2 = df['distances'] and df3 = df['areas', 'depth] and then do a df4 = pd.concat([df1, df2, df3], join='inner', axis=1) see pandas doc

or directly : pd.concat([df['turns'], df['distances'], df['areas', 'depth]], join='inner', axis=1)

EDIT :

I tried something, I hope it is what you are looking for :

link to the image with the code and the results I get with Jupyter

df1 = df['turns']
df2 = df['distances']
df3 = pd.DataFrame(df['depth'])
df4 = pd.DataFrame(df['area'])
df_recomposed = pd.concat([df1, df2, df3, df4], join='inner', axis=1)

or Pandas - How to flatten a hierarchical index in columns

where df.columns = [' '.join(col).strip() for col in df.columns.values] should be what you are looking for

Philippe
  • 396
  • 1
  • 8
  • That's a good idea! Trying to do that now, except when I index into the dataframe with 'turns' and 'distances', I get a KeyError. When I say `print(df['area'])`, I get a series object of the correct values. Not sure why one works and the other doesn't :( – Thérèse Mills Jul 16 '19 at 19:44
  • Try to convert each serie into a dataframe : [See this stackoverflow thread](https://stackoverflow.com/questions/26097916/convert-pandas-series-to-dataframe?answertab=active#tab-top) – Philippe Jul 16 '19 at 19:48