1

Assume that I have a pandas DataFrame called df that looks something like:

source      tables      columns      
src1        table1      col1       
src1        table1      col2
src1        table2      col1 
src2        table1      col1
src2        table1      col2

My current code below can iterate through the list of sources and nest the list of tables within each source as an object:

data = [
    {k: v} 

    for k, v in df.groupby('source')['tables'].agg(
        lambda x: {v: {} for v in x}).items()
    ]

    with open('data.json', 'w') as f:
        json.dump(data, f, indent = 2)

The output I'm receiving with this code is as follows:

[
  {
    "src1": {
      "table1": {},
      "table2": {}
    }
  },
  {
    "src2": {
      "table1": {},
    }
  }
]

My desired output:

[
  {
    "src1": {
      "table1": {
         "col1": {},
         "col2": {}
     },
      "table2": {
         "col1": {}
     }
    }
  },
  {
    "src2": {
      "table1": {
         "col1": {}
      }
    }
  }
]

Any assistance in converting my 2-layer nested JSON file to 3 layers as shown above would be greatly appreciated. Thank you in advance.

cs95
  • 379,657
  • 97
  • 704
  • 746

1 Answers1

2

Since you have multiple levels of grouping here, I'd recommend just using a for loop to iterate over your data.

from collections import defaultdict  

def make_nested(df): 
    f = lambda: defaultdict(f)   
    data = f()  

    for row in df.to_numpy().tolist():
        t = data
        for r in row[:-1]:
            t = t[r]
        t[row[-1]] = {}

    return data

print(json.dumps(make_nested(df), indent=2))
{
  "src1": {
    "table1": {
      "col1": {},
      "col2": {}
    },
    "table2": {
      "col1": {}
    }
  },
  "src2": {
    "table1": {
      "col1": {},
      "col2": {}
    }
  }
}

This assumes your columns are arranged from left to right: outermost keys to innermost key.

cs95
  • 379,657
  • 97
  • 704
  • 746
  • Appreciate it. I also would only need this for the first 3 columns however as I have other columns in the DataFrame as well. Can you edit your answer to have the grouping for just the first 3 columns? Thanks – weovibewvoibweoivwoiv Mar 01 '20 at 21:40
  • 1
    @weovibewvoibweoivwoiv pass df[[col1, col2, col3]] to the function, that should do it. – cs95 Mar 01 '20 at 22:23
  • posted another follow-up question, hoping you can help answer, thanks. https://stackoverflow.com/questions/60493133/how-to-nest-pandas-dataframe-converted-to-numpy-values-into-inner-most-json-ob – weovibewvoibweoivwoiv Mar 02 '20 at 16:43